Question

mapping reads to repetitive elements in genome?

4

Entering edit mode

7.8 years ago

a.rex ▴ 350

I have some ChIP-seq reads that mark repetitive elements (H3K9me3) in my genome. My data shows enrichment for H3K9me3 at repetitive elements. I removed multiple reads mapping to the same loci (i.e. PCR duplicates) before mapping to the genome. However, I did not account for the fact that reads can map to multiple sites in the genome for a particular repetitive element.

How am I able to find reads that map to more than one loci? Is there a way which I can distribute the reads equally between these loci?

ChIP-seq alignment sequencing • 4.7k views

ADD COMMENT • link updated 7.8 years ago by Michael 55k • written 7.8 years ago by a.rex ▴ 350

0

Entering edit mode

It depends on your aligner, BWA and Bowtie treat this differently.

ADD REPLY • link 7.8 years ago by Michael 55k

0

Entering edit mode

Thanks for the reply. I used BWA..

ADD REPLY • link 7.8 years ago by a.rex ▴ 350

0

Entering edit mode

As we had to find out here mirDeep2 using bowtie vs. bwa - why do more aligned reads yield less miRNA, one only gets a single optimal hit per query from BWA. You can find the total number of hits, but you can never get other equally good hits than the randomly selected one (AFAIK!) If it is good enough that each query will end up on another random location, that should be fine. Otherwise you could use different mapper, e.g. Bowtie.

ADD REPLY • link 7.8 years ago by Michael 55k

0

Entering edit mode

So if I understand correctly, then say there are 10 reads and 2 locations that these reads map to exactly. Each query read will be mapped to 1 of the 2 locations randomly. So theoretically the reads should be divided equally between the two locations? It seems a little bit bizarre that either of these two mappers do not have any functionality for dealing with these cases explicitly...

ADD REPLY • link 7.8 years ago by a.rex ▴ 350

2

Entering edit mode

bizarre

there is nothing bizarre, short reads are just not suited to resolve repeats. You have to use other sequencing strategies like cloning and primer walking if you want to resolve repeats reliably.

ADD REPLY • link 7.8 years ago by piet ★ 1.9k

0

Entering edit mode

that is true...I have done a bit more reading, and it seems that for bowtie utilising the option -best –M 1 -strata can randomly distribute reads across repeats.

ADD REPLY • link 7.8 years ago by a.rex ▴ 350

2

Entering edit mode

That is about the same behavior as BWA.

ADD REPLY • link 7.8 years ago by Michael 55k

score 1 · Answer 1 · 2017-01-14

The histone modification H3K9me3 (tri-methylation) is a silencing signal, therefore it somewhat makes sense that it is enriched at repetitive elements. The default behavior of BWA seems adequate for this situation, in order not to overrate the peak height of repetitive regions. Note that repetitive regions could also be much larger than they appear based on the nominal genome sequence, e.g.: a simple tandem repeat of 20bp in an assembly could stretch out for kilo bases if it has not been properly mapped. The result would be peaks appearing much higher and narrower than they really are. Such peaks would be detected more easily by the peak caller and therefore could be overdetected.