Question

Align chip seq data to specific sequences not in reference genome

0

Entering edit mode

3.8 years ago

j.matt.franklin • 0

Hi all,

I'm very new to bio-informatics, but have many years of coding experience. I am studying un-referenced parts of the genome. I.e. satellite repeats which aren't included in reference genomes. I want to align some raw chip seq data to some specific set of sequences. Basically, I want to make my own reference genome that's based on a small set of sequences and use that to perform chip seq.

I want to create a small custom genome, rather than add to an existing genome, so that I can save computational time.

Can anyone give me some pointers of where to get started? Am I thinking about this the right way?

Any tips/info/thoughts would be greatly appreciated!

ChIP-Seq alignment • 768 views

ADD COMMENT • link updated 3.6 years ago by Biostar 20 • written 3.8 years ago by j.matt.franklin • 0

score 1 · Answer 1 · 2020-07-23

Am I thinking about this the right way?

That is debatable. We understand you want to do this because you are interested in un-referenced parts of the genome and want to save computational time. If your sample comes from entire genome and if you try to align that data to a reduced representation of the genome (like one you want), there is always a possibility that aligners will align data (they try their best) in locations where the data may not have originated in first place.

If you still want to do this then create a multi-fasta file with sequences you are interested in, create a suitable index with aligner you want to use and align away. Remember the point about reduced representation and keep chances of multi-mapping reads (if your sequences contain repeats and you have short reads) in mind when you look at the results.

Creating a custom genome with added bits that are missing from the reference may be the best option.