Removing Duplicates for Variant calling when using genes as a reference?
6.0 years ago
Hello everybody!

I am planning to perform variant calling on a gene catalog that I retrieve from metagenomic samples. I wonder how would duplicates removal or marking could affect this. The point is that my reads have overall read length of 250bp ,thus I thought high proportion of secondary alignments may be real alignments to neighboring genes rather than real PCR duplicates during library preparation and I do not know to what extent removing would decrease sensitivity for detection. Do you have any suggestion?

Many thanks!

SNP next-gen sequencing gene • 2.2k views
5.9 years ago


The removal of PCR / optical duplicates is a topic that always makes for a good debate. The exact wet-lab process that was performed is critical.

What I suggest that you do is first detect PCR duplicates with Picard MarkDuplicates and then gauge whether or not you should remove them. In some circumstances, again based on the wet-lab process, it's just not feasible or correct to remove duplicates.

At the end of the day, if you do extensive testing with and without duplicates, I think that you'll find that your results will mostly stay the same.



