HaplotypeCaller VCF depth is greater than the number of reads in bam
Entering edit mode
27 days ago


I call gvcf file using GATK HaplotypeCaller as following:

gatk HaplotypeCaller -R my.fasta \
    -I s-95.sort.noDup.bam \
    -L 3R:23000000-27905053 \
    -ERC GVCF \
    -bamout test_s95.bamout.bam \
    --native-pair-hmm-threads 28 \
    -O test_s95.sort.noDup.g.vcf

The above ouput gvcf reports a variant at 3R:25063300

3R      25063300        .       C       T,<NON_REF>     804.64  .       BaseQRankSum=-2.060;DP=59;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=212400,59;ReadPosRankSum=-1.269       GT:AD:DP:GQ:PGT:PID:PL:PS:SB    0|1:37,22,0:59:99:0|1:25063284_G_T:812,0,1487,924,1554,2477:25063284:17,20,12,10

The DP=59 for this sites based on the vcf, which if I understand correctly means there are 59 reads covering this poosition.

But when I check the position in the input bam, I found the number of reads at this position is only 36.

samtools view s-95.sort.noDup.bam "3R:25063300-25063300" | wc -l
# 36

[I also count the reads manually using samtools tview and confirmed 36 reads here]

I couldn't understand how could depth in vcf could outnumber the reads in the bam file!


I further checked the number of read at this position in test_s95.bamout.bam and found 54 reads covering this position! Although it's still less than the depth(59) in VCF, they are much closer than that in the raw bam file(s-95.sort.noDup.bam). It seems that the HaplotypeCaller grabs some other reads and call variants, rather than just using the reads mapped at this position.

I am totally confused, and appreciate it if someone could offer some help!


HaplotypeCaller vcf gatk • 233 views
Entering edit mode
27 days ago

well isn't it meaningless in GVCF mode ?What is the output without -ERC GVCF What is the output using --dont-use-soft-clipped-bases ? did you look the reads at IGV ?

Entering edit mode

Thanks, --dont-use-soft-clipped-bases solved the problem. It seems samtools view/tview doesn't show the soft clip base.

If I don't use the soft clip base to call variants, the position 3R:25063300 is not a variant - all bases are reference. The soft clip base caused the position to be a polymorphic sites.

Anyway, thanks for your solution!


Login before adding your answer.

Traffic: 2313 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6