Why Iontorrent PGM generates high percent of duplicates?
1
0
Entering edit mode
9.0 years ago
murali ▴ 110

Possible duplicate: Very High Percentage Of Reads Are Pcr Duplicates - Iontorrent

I am working on the cancer hotspot panel of Iontorrent data. I have generated the alignment (sorted bam file), then ran the mark duplicates module (MarkDuplicates.jar) of picard tools. Astonishingly, 96 percent of reads were duplicates.

samtools flagstat dedup_reads.bam
#################################################################
55194 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
52176 + 0 duplicates
54227 + 0 mapped (98.25%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
######################################################

What are the possible reasons for getting such high number of duplicates for Iontorrent data?

Mark duplicates picard tools Amplicon seq • 2.3k views
ADD COMMENT
2
Entering edit mode
9.0 years ago
User 59 13k

Because you've used an amplicon assay, it's nothing to do with your sequencing platform.

By the very nature of what you've done (used PCR for target enrichment) you're going to end up with basically everything duplicated. Don't deduplicate amplicon data. You can only do this with hybridisation-based enrichment strategies and/or whole genome sequencing - anything where you've randomly fragmented your DNA before library preparation.

ADD COMMENT

Login before adding your answer.

Traffic: 2815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6