I have some ChIP-seq reads that mark repetitive elements (H3K9me3) in my genome. My data shows enrichment for H3K9me3 at repetitive elements. I removed multiple reads mapping to the same loci (i.e. PCR duplicates) before mapping to the genome. However, I did not account for the fact that reads can map to multiple sites in the genome for a particular repetitive element.
How am I able to find reads that map to more than one loci? Is there a way which I can distribute the reads equally between these loci?
It depends on your aligner, BWA and Bowtie treat this differently.
Thanks for the reply. I used BWA..
As we had to find out here mirDeep2 using bowtie vs. bwa - why do more aligned reads yield less miRNA, one only gets a single optimal hit per query from BWA. You can find the total number of hits, but you can never get other equally good hits than the randomly selected one (AFAIK!) If it is good enough that each query will end up on another random location, that should be fine. Otherwise you could use different mapper, e.g. Bowtie.
So if I understand correctly, then say there are 10 reads and 2 locations that these reads map to exactly. Each query read will be mapped to 1 of the 2 locations randomly. So theoretically the reads should be divided equally between the two locations? It seems a little bit bizarre that either of these two mappers do not have any functionality for dealing with these cases explicitly...
there is nothing bizarre, short reads are just not suited to resolve repeats. You have to use other sequencing strategies like cloning and primer walking if you want to resolve repeats reliably.
that is true...I have done a bit more reading, and it seems that for bowtie utilising the option -best –M 1 -strata can randomly distribute reads across repeats.
That is about the same behavior as BWA.