Question: Pindel and GATK AD Concordance
3 months ago
was hoping to get some advice on working with the AD field for variants in VCFs produced by Pindel and Pindel2VCF.

My lab has a variant calling pipeline that uses GATK's UnifiedGenotyper for calling SNVs and small Indels, as well as using Pindel to call deletions and short insertions. As part of our VCF filtering, we remove variants with an Variant Allele Fraction below a certain threshold. However, it has been brought to my attention by other members of the lab that when Pindel and UnifiedGenotyper call the same variant, they have different AD values and so different VAF values (I'm not sure if it's relevant, but the AD for UnifiedGenotyper is the same as found in IGV).

Normally the difference is small enough that it doesn't affect our filtering, but there have been some cases where a variant called by both branches of the pipeline has an VAF greater than our cutoff threshold in one VCF, but has an VAF lower than threshold in the other VCF. To avoid this issue, which value for AD should I be using for calculating VAF? The AD from UnifiedGenotyper, or the AD from Pindel?

Thanks in advance for any suggestions.

3 months ago
Dan Gaston
I have a somatic variant caller that merges variant calling from six different variant callers, and having them calculate different VAFs is pretty normal. Each caller has its own (sometimes tunable) criteria for inclusion/exclusion of reads or bases for instance, and in somatic variant calling this can have a big impact. This tends to effect both AD and total depth used by the variant caller. What I do is take the max VAF from my callers and use that for the filtering stage.

Thanks for the advice!

