Question: Variant calling: GT field disagrees with DP4 field
gravatar for al.bodrug
3.3 years ago by
al.bodrug50 wrote:

I am calling variants from ~100 samples jointly with the consensus caller in bcftools. I request the DP4 field (deprecated) which reports the number of reads which support [reference_forward, reference_reverse, variant_forward, variant_reverse]. The genotype provided in the GT fields often disagrees with what is contained in the DP4 field. Example sample 3 and sample 5 have a heterozygous GT despite having respectively only one reference supporting read or no read at all:

CHROM: chrom15
POS: 11169
sample1 0/0:0,9,89:3:0:1,2,0,0:1,0:2,0:3,0
sample2 0/0:0,9,80:3:0:2,1,0,0:2,0:1,0:3,0
sample3 0/1:0,3,25:1:0:1,0,0,0:1,0:0,0:1,0
sample4 0/0:0,6,58:2:0:1,1,0,0:1,0:1,0:2,0
sample5 0/1:0,0,0:0:0:0,0,0,0:0,0:0,0:0,0
sample6 0/1:0,3,26:1:0:0,1,0,0:0,0:1,0:1,0
sample7 1/1:67,6,0:2:0:0,0,0,2:0,0:0,2:0,2
sample8 1/1:54,6,0:2:0:0,0,1,1:0,1:0,1:0,2
sample9 0/1:0,0,0:0:0:0,0,0,0:0,0:0,0:0,0

Below is the genuine line in the the bcf:

chrom15 11169   .   G   A   999 .   DP=268;VDB=0.049104;SGB=10.9066;RPB=0.295564;MQB=0.913243;MQSB=0.000116646;BQB=0.145125;MQ0F=0;AF1=0.524266;G3=0.370806,0.212478,0.416716;HWE=0.000166307;AC1=182;DP4=57,69,59,79;MQ=37;FQ=999;PV4=0.710785,0.0623765,1,1   GT:PL:DP:SP:DP4:ADF:ADR:AD  0/0:0,9,89:3:0:1,2,0,0:1,0:2,0:3,0  0/0:0,9,80:3:0:2,1,0,0:2,0:1,0:3,0  0/1:0,3,25:1:0:1,0,0,0:1,0:0,0:1,0  0/0:0,6,58:2:0:1,1,0,0:1,0:1,0:2,0  0/1:0,0,0:0:0:0,0,0,0:0,0:0,0:0,0   0/1:0,3,26:1:0:0,1,0,0:0,0:1,0:1,0  1/1:67,6,0:2:0:0,0,0,2:0,0:0,2:0,2  1/1:54,6,0:2:0:0,0,1,1:0,1:0,1:0,2  0/1:0,0,0:0:0:0,0,0,0:0,0:0,0:0,0   0/1:0,3,37:1:0:0,1,0,0:0,0:1,0:1,0 etc

I checked in a viewer if the DP4 fields are indeed correct and they are (a single read supporting reference in sample 3, and nothing in sample 5, as well as a correct report for all the other samples). Some samples do support a G->A SNP and those have a GT=1/1 field.

My question is, why bcftools call sets the samples 3 and 5 as heterozygous calls? Is there anything beyond read information that is taken into account that could influence the call this much?

The whole region is not heterozygous and there is no other SNP in the 200 positions before and after this one that could motivate a default call to heterozygous.


The solution is to use the bcftools multi-caller option -m . This seems to solve a large portion of the discrepancy between dp4 tags and GT field. However some problems remain like a heterozygous call with a 0,0,1,0 dp4 field. Maybe the environment of the position has an influence.

dp4 genotype gt snp • 1.2k views
ADD COMMENTlink modified 3.3 years ago by Biostar ♦♦ 20 • written 3.3 years ago by al.bodrug50

most callers will assign GT based on "high quality" reads listed in AD

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Jeremy Leipzig19k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1381 users visited in the last hour