Question: Why Iontorrent PGM generates high percent of duplicates?
0
gravatar for murali
4.0 years ago by
murali90
Germany
murali90 wrote:

Possible duplicate: Very High Percentage Of Reads Are Pcr Duplicates - Iontorrent

I am working on the cancer hotspot panel of Iontorrent data. I have generated the alignment (sorted bam file), then ran the mark duplicates module (MarkDuplicates.jar) of picard tools. Astonishingly, 96 percent of reads were duplicates.

samtools flagstat dedup_reads.bam
#################################################################
55194 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
52176 + 0 duplicates
54227 + 0 mapped (98.25%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
######################################################

What are the possible reasons for getting such high number of duplicates for Iontorrent data?

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by murali90
2
gravatar for Daniel Swan
4.0 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

Because you've used an amplicon assay, it's nothing to do with your sequencing platform.  

By the very nature of what you've done (used PCR for target enrichment) you're going to end up with basically everything duplicated.  Don't deduplicate amplicon data.  You can only do this with hybridisation-based enrichment strategies and/or whole genome sequencing - anything where you've randomly fragmented your DNA before library preparation.

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Daniel Swan13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1512 users visited in the last hour