Is there a way to Scan and Correct bad vcf INFO values?
2.4 years ago
bwubb • 0


I am playing around with two vcf files I generated with GATKv3.7. Both were generated with the parameters -G Standard -G AS_Standard, but one was generated in GGA mode from the sites of the other. That GGA vcf has some AS_Standard values missing.

##INFO=<ID=AS_MQ,Number=A,Type=Float,Description="Allele-specific RMS Mapping Quality">

But for some nonspecific number of variants they are annotated AS_MQ; with no float given. I noticed this from an attempted merge of their shared variants using bcftools.

Not ready for type [0]: AS_MQ at 12133603 was the exact error.

I was able to merge after sed replacing any value or AS_MQ; with AS_MQ=.; but I was wondering if there was any tool/command that could have done this for me and for any potentially wrong/missing value in a vcf format aware manner? I can build in a manual check for that AS_MQ fix, but I am trying to "fool proof" it in case it ever happens in another annotation.

Thank you!

GATK has a history of producing VCFs that are incompatible with other tools. Could you perhaps paste the VCF header and a few offending and non-offending lines so that we could try it at our end? When you paste it, highlight and wrap it with the 101 010 button


