Question

Retaining duplicate mapping

0

Entering edit mode

6.7 years ago

connor.driscoll88 • 0

I'm trying to perform a ChIP-Exo analysis with single-end Illumina reads (~50 bp). ChIP-Exo is similar in concept to ChIP-Seq, but produces more identical reads. So I want to ensure that my alignments are allowing for different reads to map to the exact same genomic positions. The closest thing I'm seeing in the bowtie2 manual seems to be focused on the same read mapping to multiple locations, not the other way around. When I look at my current bam files in IGV, I see what looks like different reads mapped to the same position, although usually at <10x coverage.

When I use the samtools flagstat command on my bam files, my output looks like this:

26780887 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
26082701 + 0 mapped (97.39%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

There are 0 duplicates being identified, and my understanding is that duplicates in this case are "PCR or optical duplicates." Am I correct in interpreting these as different reads mapped to identical locations?

I'm ultimately trying to find a way to align reads with duplicated mappings (different reads, same genomic position). Perhaps my current alignments are already like this and I'm getting confused by some of the terminology, but I want to ensure what I'm doing is correct.

ChIP-Exo alignment • 1.4k views

ADD COMMENT • link updated 6.7 years ago by h.mon 35k • written 6.7 years ago by connor.driscoll88 • 0

score 1 · Answer 1 · 2017-08-31

samtools flagstat will not de novo search for duplicated reads, it will just count reads marked as duplicates (e.g. by picard MarkDuplicates). So you will have to run Picard beforehand in order to flagstat see the duplicates.

[...] are "PCR or optical duplicates." Am I correct in interpreting these as different reads mapped to identical locations?

Yes, you are.