HaplotypeCaller VCF depth is greater than the number of reads in bam
2
0
Entering edit mode
3.4 years ago
wang.yiguan ▴ 10

Hi,

I call gvcf file using GATK HaplotypeCaller as following:

gatk HaplotypeCaller -R my.fasta \
    -I s-95.sort.noDup.bam \
    -L 3R:23000000-27905053 \
    -ERC GVCF \
    -bamout test_s95.bamout.bam \
    --native-pair-hmm-threads 28 \
    -O test_s95.sort.noDup.g.vcf

The above ouput gvcf reports a variant at 3R:25063300

3R      25063300        .       C       T,<NON_REF>     804.64  .       BaseQRankSum=-2.060;DP=59;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=212400,59;ReadPosRankSum=-1.269       GT:AD:DP:GQ:PGT:PID:PL:PS:SB    0|1:37,22,0:59:99:0|1:25063284_G_T:812,0,1487,924,1554,2477:25063284:17,20,12,10

The DP=59 for this sites based on the vcf, which if I understand correctly means there are 59 reads covering this poosition.

But when I check the position in the input bam, I found the number of reads at this position is only 36.

samtools view s-95.sort.noDup.bam "3R:25063300-25063300" | wc -l
# 36

[I also count the reads manually using samtools tview and confirmed 36 reads here]

I couldn't understand how could depth in vcf could outnumber the reads in the bam file!

===

I further checked the number of read at this position in test_s95.bamout.bam and found 54 reads covering this position! Although it's still less than the depth(59) in VCF, they are much closer than that in the raw bam file(s-95.sort.noDup.bam). It seems that the HaplotypeCaller grabs some other reads and call variants, rather than just using the reads mapped at this position.

I am totally confused, and appreciate it if someone could offer some help!

Thanks!

HaplotypeCaller vcf gatk • 1.5k views
ADD COMMENT
1
Entering edit mode
3.4 years ago

well isn't it meaningless in GVCF mode ?What is the output without -ERC GVCF What is the output using --dont-use-soft-clipped-bases ? did you look the reads at IGV ?

ADD COMMENT
0
Entering edit mode

Thanks, --dont-use-soft-clipped-bases solved the problem. It seems samtools view/tview doesn't show the soft clip base.

If I don't use the soft clip base to call variants, the position 3R:25063300 is not a variant - all bases are reference. The soft clip base caused the position to be a polymorphic sites.

Anyway, thanks for your solution!

ADD REPLY
0
Entering edit mode
16 months ago
pragnapcu ▴ 10

What if the bam generated after GATK BaseRecalibrator has more reads than the depth in vcf?

ADD COMMENT

Login before adding your answer.

Traffic: 1351 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6