I call gvcf file using GATK
HaplotypeCaller as following:
gatk HaplotypeCaller -R my.fasta \ -I s-95.sort.noDup.bam \ -L 3R:23000000-27905053 \ -ERC GVCF \ -bamout test_s95.bamout.bam \ --native-pair-hmm-threads 28 \ -O test_s95.sort.noDup.g.vcf
The above ouput gvcf reports a variant at
3R 25063300 . C T,<NON_REF> 804.64 . BaseQRankSum=-2.060;DP=59;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=212400,59;ReadPosRankSum=-1.269 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 0|1:37,22,0:59:99:0|1:25063284_G_T:812,0,1487,924,1554,2477:25063284:17,20,12,10
DP=59 for this sites based on the vcf, which if I understand correctly means there are 59 reads covering this poosition.
But when I check the position in the input bam, I found the number of reads at this position is only 36.
samtools view s-95.sort.noDup.bam "3R:25063300-25063300" | wc -l # 36
[I also count the reads manually using
samtools tview and confirmed 36 reads here]
I couldn't understand how could depth in vcf could outnumber the reads in the bam file!
I further checked the number of read at this position in
test_s95.bamout.bam and found 54 reads covering this position! Although it's still less than the depth(59) in VCF, they are much closer than that in the raw bam file(
s-95.sort.noDup.bam). It seems that the
HaplotypeCaller grabs some other reads and call variants, rather than just using the reads mapped at this position.
I am totally confused, and appreciate it if someone could offer some help!