Question: Variant calling step order question: base recalibration & mark duplicates, which is first?
gravatar for alons
3.5 years ago by
alons270 wrote:

Hi all,

We're going through & revising our variant calling pipeline on NGS data from cancer patients and a question came up:

Which step should be done first (and why), base recalibration or mark duplicates?

Currently we recalibrate bases first and then mark duplicates.

The reason I'm asking this is that we originally based part of our pipeline on the following article, which said that you recalibrate bases and then mark duplicates:

However, in the following Broad Institute best practices page it says the opposite, you mark duplicates and then recalibrate bases, saw it in another paper as well:

Thanks in advance!


ADD COMMENTlink modified 3.5 years ago by Brian Bushnell17k • written 3.5 years ago by alons270

As per GATK best practices workflow here,, mark duplicates first, followed by base recalibration.

ADD REPLYlink written 3.5 years ago by cpad011214k
gravatar for mforde84
3.5 years ago by
mforde841.3k wrote:

I'd probably remove duplicates first, since BSRC is generating some sort of covariation model with all of the supplied reads. I'm assuming that having a bunch of clonal artifacts in your dataset might throw this off a little. But honestly, you should ask the GATK people as they have a better understanding of the underlying model.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by mforde841.3k
gravatar for Brian Bushnell
3.5 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

Recalibrating bases should not really improve (or affect) duplicate detection. But duplicate removal can improve recalibration, so I'd do that first. And the earlier you remove duplicates, the faster everything else becomes.

ADD COMMENTlink written 3.5 years ago by Brian Bushnell17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1622 users visited in the last hour