Question: Why Do We Need Markduplicates For Variants Detection In Gatk Processing Pipeline?
12
gravatar for Lds
7.1 years ago by
Lds390
Lds390 wrote:

Hi fellows,

It's said that MarkDuplicates in Picard matches all read pairs that have identical 5' coordinates and orientations and marks as duplicates all but the 'best' pair. If I have three pairs, with one of which is the 'best' pair, they're all truely from the target genome but not from sequencing artifacts, and if I set REMOVE_DUPLICATES=True, it will delete the two non-best pairs, then it will decrease the coverage for that region. This doesn't make sense, maybe I misunderstood the purpose of MarkDuplicates. So my question is, what's the purpose for MarkDuplicates, why does it delete the duplicates?

Thanks in advance

gatk picard markduplicates • 12k views
ADD COMMENTlink written 7.1 years ago by Lds390

Lots of previous information in these threads: http://biostar.stackexchange.com/search?q=duplicates

ADD REPLYlink written 7.1 years ago by Chris Miller20k
8
gravatar for Sean Davis
7.1 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

Almost all statistical models for variant calling assume some sort of independence between measurements. The duplicates (if one assumes that they arise from PCR artifact) are not independent. This lack of independence will usually lead to a breakdown of the statistical model and measures of statistical significance that are incorrect.

There are experiments where one should not make the assumption that reads that have the same start positions are PCR duplicates. In that case, using MarkDuplicates is not justified.

ADD COMMENTlink written 7.1 years ago by Sean Davis25k

Thanks so much. This is the discussion in seqanswers: http://seqanswers.com/forums/showthread.php?t=6854

I think that we should using MarkDuplicates in SNP calling.

ADD REPLYlink written 7.1 years ago by Lds390

Yes, you should.

ADD REPLYlink written 7.1 years ago by Sean Davis25k
5
gravatar for Alex Paciorkowski
7.1 years ago by
Rochester, NY USA
Alex Paciorkowski3.3k wrote:

MarkDuplicates is important in removing PCR duplicates -- which can introduce bias in your variant calling. If you did not mark duplicates, you would risk having over-representation in your sequence of areas preferentially amplified during PCR. One way to think about it is that marking duplicates and removing them does not really have a detrimental effect on your overall depth of coverage -- but increases the quality/reliability of the areas you have covered.

There is a good discussion covered here.

And also further discussion on the Picard Main Page.

ADD COMMENTlink written 7.1 years ago by Alex Paciorkowski3.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 752 users visited in the last hour