Question: Snps & Pcr Duplicates
1
gravatar for jackuser1979
6.9 years ago by
jackuser1979860
US
jackuser1979860 wrote:

I have identified SNPs for illumina paired-end reads mapped to reference genome and with variant caller varscan with out removing PCR duplicates it comes around 25K SNPs identified. But when I tried with all same parameters with removing PCR duplicates with picard, I end up with 27K SNPs identified. Why there is raise SNPs identification?. I thought by removing PCR duplicates we may end up with decrease SNPs.Please let me know you thoughts.

bioinformatics biology • 3.0k views
ADD COMMENTlink modified 4.6 years ago by shreygandhi19900 • written 6.9 years ago by jackuser1979860
2
gravatar for Rm
6.9 years ago by
Rm7.9k
Danville, PA
Rm7.9k wrote:

It can be possible: Lets take a hypothetical situation of a region with 100X coverage of which say 40 reads come from PCR duplicate and are of Reference base: and lets say from other reads 10 reads representing a variant.

If you don't remove PCR duplicates the % of variant will 10% but if you remove the PCR duplicates: variant percentage will increase and can cross the threshold to be called as a variant.

If above cases are more in your data: then you can encounter more SNV's (not SNP) after PCR duplicate removal.

ADD COMMENTlink written 6.9 years ago by Rm7.9k
1
gravatar for Ketil
6.9 years ago by
Ketil4.0k
Germany
Ketil4.0k wrote:

I'm not familiar with how picard or varscan works, but my guess would be that removing duplicates skews the distribution of reads. If you have high coverage, picard might remove many reads that happen to come from the same starting position as PCR duplicates. (You're not talking RNA sequences, are you? That might explain it.) Reads with different variants would not be identified as duplicates, and this would even out the distribution of alleles. Varscan would then report rare variants that would now be less rare, relatively speaking. Usually, variant calling is statistics heavy stuff, and I'd not expect this kind of weakness, so this explanation is likely wrong :-)

Personally, I'd not remove duplicates unless I had reason to believe there are many of them - and if you get many duplicates from Illumina, you are Doing It Wrong.

ADD COMMENTlink written 6.9 years ago by Ketil4.0k
0
gravatar for shreygandhi1990
4.6 years ago by
United States
shreygandhi19900 wrote:

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0058815#s1

ADD COMMENTlink written 4.6 years ago by shreygandhi19900
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1509 users visited in the last hour