I have a sample (Sample1) that is analyzed using BWA v0.7.6a-r433 and GATK 126.96.36.199. I called the variants using HaplotypeCaller using the ERC mode (GVCF) and then joint genotyped the samples (8 samples) using GenotypeGVCFs. In one region in this sample, there is a variant that is TRUE (based on a third-party quality control, chr1:169510380) that we did NOT call.
As you can see below, the variant exists in Sample1 when we look at the BAM file which is generated from BWA. But when I take the BAM file that is generated by HaplotypeCaller (containing the active regions by using –bamOut option), one sees clearly that HaplotypeCaller didn’t consider this region in Sample1 as an active region and thus there is no way of calling the variant. So the problem is one step before the joint genotyping.
This variant was called in another sample from the same run (Sample2).There is no big difference in the quality of the reads between the two samples (The coverage of both is around 500 and there is no bias in one strand or the other in both samples).
My questions are:
- Where should I start searching for the reason? What can I do to check why is this not called in Sample1?
- What are the parameters in the original BAM files (generated from BWA) that lead to skip a region in HaplotypeCallerERC in Sample1?. I read the paper here (Scaling accurate genetic variant discovery to tens of thousands of samples) but was not able to figure out an exact number for a position to look after, it was too technical and I was lost in-between equations. I also read this simple document (ActiveRegion determination) without getting much wiser.
- How can I force HaplotypeCaller to call this variant in Sample1?
I hope I was able to explain the case clearly. Please let me know if I missed any required details and/or you need any more info.
Thanks for your help.