Hi, I'm working with GATK/184.108.40.206 on human whole-genome data.
I'm currently following the procedure to go from a gVCF to a VCF (the gVCF was obtained with HaplotypeCaller using -ERC GVCF).
The order of the tools I'm following is: GenotypeGVCFs -> VariantFiltration -> MakeSitesOnlyVcf -> VariantRecalibrator -> ApplyVQSR
Since I need to include also all the loci found to be non-variant after genotyping, I'm using the "-all-sites true" option in GenotypeGVCFs.
In the VCF I obtain from GenotypeGVCFs the majority of the 0/0 sites only have the DP in the INFO field but lack of all the other information that the VariantRecalibrator will need in a later step (e.g., QD,FS, SOR, MQ, MQRankSum, ReadPosRankSum, and InbreedingCoeff).
Is there any way to have those information for all the sites?
And if not, will the DP only be enough for the VariantRecalibrator to work on them?
For example, if I have these two sites in the VCF after GenotypeGVCFs:
chr1 10436 . C . 87.81 . DP=55 GT:AD:DP:RGQ 0/0:55,0:55:51 chr1 13868 . A G 122.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-2.950e-01;ClippingRankSum=-7.660e-01;DP=15;ExcessHet=3.0103;FS=15.564;MLEAC=1;MLEAF=0.500;MQ=32.73;MQRankSum=-2.534e+00;QD=8.17;RAW_MQ=16069.00;ReadPosRankSum=0.412;SOR=3.898 GT:AD:DP:GQ:PL0/1:9,6:15:99:130,0,248
Will the VariantRecalibrator need them to have the same INFO information or will it work properly in any case, even if the first site has only the DP and the second one has many other information?
I need the final VCF to include all the sites (0/0, 0/1, and 1/1). So far, everything I've tried always ended with removing all the 0/0 sites eventually.
Could someone please help me with this?