Variant calling: GT field disagrees with DP4 field
0
2
Entering edit mode
6.5 years ago
al.bodrug ▴ 50

I am calling variants from ~100 samples jointly with the consensus caller in bcftools. I request the DP4 field (deprecated) which reports the number of reads which support [reference_forward, reference_reverse, variant_forward, variant_reverse]. The genotype provided in the GT fields often disagrees with what is contained in the DP4 field. Example sample 3 and sample 5 have a heterozygous GT despite having respectively only one reference supporting read or no read at all:

CHROM: chrom15
POS: 11169
REF G
ALT A
FORMAT : GT:PL:DP:SP:DP4:ADF:ADR:AD
sample1 0/0:0,9,89:3:0:1,2,0,0:1,0:2,0:3,0
sample2 0/0:0,9,80:3:0:2,1,0,0:2,0:1,0:3,0
sample3 0/1:0,3,25:1:0:1,0,0,0:1,0:0,0:1,0
sample4 0/0:0,6,58:2:0:1,1,0,0:1,0:1,0:2,0
sample5 0/1:0,0,0:0:0:0,0,0,0:0,0:0,0:0,0
sample6 0/1:0,3,26:1:0:0,1,0,0:0,0:1,0:1,0
sample7 1/1:67,6,0:2:0:0,0,0,2:0,0:0,2:0,2
sample8 1/1:54,6,0:2:0:0,0,1,1:0,1:0,1:0,2
sample9 0/1:0,0,0:0:0:0,0,0,0:0,0:0,0:0,0

Below is the genuine line in the the bcf:

chrom15 11169   .   G   A   999 .   DP=268;VDB=0.049104;SGB=10.9066;RPB=0.295564;MQB=0.913243;MQSB=0.000116646;BQB=0.145125;MQ0F=0;AF1=0.524266;G3=0.370806,0.212478,0.416716;HWE=0.000166307;AC1=182;DP4=57,69,59,79;MQ=37;FQ=999;PV4=0.710785,0.0623765,1,1   GT:PL:DP:SP:DP4:ADF:ADR:AD  0/0:0,9,89:3:0:1,2,0,0:1,0:2,0:3,0  0/0:0,9,80:3:0:2,1,0,0:2,0:1,0:3,0  0/1:0,3,25:1:0:1,0,0,0:1,0:0,0:1,0  0/0:0,6,58:2:0:1,1,0,0:1,0:1,0:2,0  0/1:0,0,0:0:0:0,0,0,0:0,0:0,0:0,0   0/1:0,3,26:1:0:0,1,0,0:0,0:1,0:1,0  1/1:67,6,0:2:0:0,0,0,2:0,0:0,2:0,2  1/1:54,6,0:2:0:0,0,1,1:0,1:0,1:0,2  0/1:0,0,0:0:0:0,0,0,0:0,0:0,0:0,0   0/1:0,3,37:1:0:0,1,0,0:0,0:1,0:1,0 etc

I checked in a viewer if the DP4 fields are indeed correct and they are (a single read supporting reference in sample 3, and nothing in sample 5, as well as a correct report for all the other samples). Some samples do support a G->A SNP and those have a GT=1/1 field.

My question is, why bcftools call sets the samples 3 and 5 as heterozygous calls? Is there anything beyond read information that is taken into account that could influence the call this much?

The whole region is not heterozygous and there is no other SNP in the 200 positions before and after this one that could motivate a default call to heterozygous.

EDIT:

The solution is to use the bcftools multi-caller option -m . This seems to solve a large portion of the discrepancy between dp4 tags and GT field. However some problems remain like a heterozygous call with a 0,0,1,0 dp4 field. Maybe the environment of the position has an influence.

SNP dp4 GT genotype • 1.9k views
ADD COMMENT
1
Entering edit mode

most callers will assign GT based on "high quality" reads listed in AD

ADD REPLY

Login before adding your answer.

Traffic: 2052 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6