When I run the following:
bcftools mpileup -BI -Ov -d 400000 -a AD,DP -o sample.A.vcf -f ref.fasta -R regions.bed sample.A.sorted.bam
I get the following output at a specific position:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sampleA
contig00012 439 . C A,<*> 0 . DP=1365;I16=324,244,246,260,27639,1.51696e+06,24253,1.30823e+06,69,1683,106,4162,12757,303189,10513,247245;QS=0.523724,0.476276,0;VDB=0.473402;SGB=-0.693147;RPB=0.000285419;MQB=0.999823;MQSB=0.99981;BQB=0.024168;MQ0F=0.782418 PL:DP:AD 0,12,0,255,255,1:1074:568,506,0
Which from the AD field indicates that there are 568 alleles supporting the reference C
allele and 506 supporting the alternative A
allele.
However when I pass this to bcftools to call genotypes using:
bcftools call -mO v -o sample_A_called.vcf sample_A.vcf
I get the following:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT DB31-92
contig00012 439 . C . 25.9297 . DP=1365;VDB=0.473402;SGB=-0.693147;RPB=0.000285419;MQB=0.999823;MQSB=0.99981;BQB=0.024168;MQ0F=0.782418;AN=2;DP4=324,244,246,260;MQ=0 GT:DP:AD 0/0:1074:568
where it is now called homozygous reference (0/0
) and the allelic depth for the alternative A
allele seems to be missing.
From a quick visual inspection of the bam file this appears to be a true variant position and the base quality appears to be good.
Any suggestions on the reason for this behavior is greatly appreciated. Many thanks.