Question

Why Iontorrent PGM generates high percent of duplicates?

0

Entering edit mode

9.0 years ago

murali ▴ 110

Possible duplicate: Very High Percentage Of Reads Are Pcr Duplicates - Iontorrent

I am working on the cancer hotspot panel of Iontorrent data. I have generated the alignment (sorted bam file), then ran the mark duplicates module (MarkDuplicates.jar) of picard tools. Astonishingly, 96 percent of reads were duplicates.

samtools flagstat dedup_reads.bam
#################################################################
55194 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
52176 + 0 duplicates
54227 + 0 mapped (98.25%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
######################################################

What are the possible reasons for getting such high number of duplicates for Iontorrent data?

Mark duplicates picard tools Amplicon seq • 2.3k views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 9.0 years ago by murali ▴ 110

Ram · Answer 1 · 2015-05-04

Because you've used an amplicon assay, it's nothing to do with your sequencing platform.

By the very nature of what you've done (used PCR for target enrichment) you're going to end up with basically everything duplicated. Don't deduplicate amplicon data. You can only do this with hybridisation-based enrichment strategies and/or whole genome sequencing - anything where you've randomly fragmented your DNA before library preparation.