When to remove duplicates using deduplication in Exome-Seq
1
0
Entering edit mode
5.3 years ago

Hello group,

To remove bias in calculating variant frequencies, we tend to remove PCR optical duplicates in Illumina exome-Seq protocols. However, in targeted sequencing, we do not remove PCR duplicates.

I am not sure about Illumina truseq rapid exome sequencing library approach. Should the BAM file deduplicated before variant calling in GATK pipeline.

Thanks.

TruSeq Exome-Seq DNA-Seq TrueSeq GATK • 2.4k views
ADD COMMENT
1
Entering edit mode
5.3 years ago

I would typically expect duplicates to be removed for Exomes, but it is hard to say what is absolutely right for all variants (for all target designs).

For example, you may notice weird behavior in GATK with very high coverage (say, >1000x or >10000x) regions, and/or shifts in variant frequencies as coverage gets saturated.

I would guess this is more of an issue with more targeted panels (or even a single amplicon). While it isn't precisely shown in this paper, you can roughly see some potential value in having a customized variant calling procedure in Figure 9 (at least in terms of some samples having a noticeably higher rate of novel SNPs with the targeted panel; as a percent, not absolute counts, I would be a bit worried due to trends like Figure 7), although there is also a decrease in sensitivity in that specific example. And, perhaps more relevant to your question, the Exome example wasn't as bad for GATK :)

Also, the duplicate removal is generally for array-hybridization enrichment (for amplicon enrichment, you can't remove duplicates).

ADD COMMENT

Login before adding your answer.

Traffic: 2075 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6