How to ignore "GT" from vcf file
19 months ago
SUDOsundu ▴ 60

I used medaka_haploid_variant caller to identify variants in my viral reads. After running medaka tools annotate, I got an annotated vcf file. Then filtering by bcftools,

bcftools complaining about GT not defined in the annotated vcf

bcftools filter -Ob -e 'DP<1000' -o filtered.vcf annotated.vcf

I got the following error

FORMAT 'GT' at NC_xxxxx:33 is not defined in the header, assuming Type=String

From the github post I made https://github.com/nanoporetech/medaka/issues/257, I came to that GT does not appear in the VCF header but does occur in the records which is a bug in medaka.
Now, how to run bcftools by ignoring the GT? Is it possible?

nanopore medaka bcftools
easiest would be to add one line to the header. You can do that with any text editor. The other option is to remove the GT fields from the format column, which can be a bit more complicated, depending on your prior experience.

19 months ago

 awk '/^#CHROM/ {printf("##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">\n");} {print}'  input.vcf