I was trying to get stats on # HOM VAR #HET etc from my vcf file using
VariantsToTable. As below:
gatk VariantsToTable -R /hg38/Homo_sapiens_assembly38.fasta \ -V all_jointcalls_annRegion.vcf \ -F CHROM -F ID -F POS -F REF -F ALT -F QUAL -F DP \ -GF GT -GF GQ -GF HET -GF HOM-REF \ -O all_jointcalls_trial.table
But no HET or HOM etc is called across board !
CHROM ID POS REF ALT QUAL DP sample1.GT sample1.GQ sample1.HET sample1.HOM-REF chr2 rs2303425 47403074 T C 11431.45 4041 T/T 99 NA NA
However when I used Dave Tang's way: https://github.com/davetang/learning_vcf_file#extracting-info-fields
bcftools stats -s - all_jointcalls_annRegion.vcf | grep -A 169 > "Per-sample counts"
I get counts:
# PSC, Per-sample counts. Note that the ref/het/hom counts include only SNPs, for indels see PSI. The rest include both SNPs and indels. # PSC id sample **nRefHom** **nNonRefHom** **nHets** nTransitions nTransversions nIndels average depth nSingletons nHapRef nHapAlt nMissing PSC 0 sample1 **658** **38** **65** 63 40 0 3.6 8 0 0 303
This is true across all variants across all samples that I have in my multi sample vcf file. bcftools gives me numbers whereas GATK VariantsToTable is a
What is not right here with VariantsToTable`? I read somewhere on GATK posts on their site that they do not recommend other ways of parsing vcf generated by GATK, so a little hesitant using bcftools. Anything wrong in parsing it with bcftools?