I am trying to normalize, filter, and annotate variants in .vcf format. Right now, my workflow looks like this:
left-normalize & filter .vcf (bcftools / GATK)
convert .vcf to .tsv (GATK)
recalculate values in .tsv (e.g.
annotate .vcf (ANNOVAR)
merge annotation & .tsv
However, I am having issues with variants that are formatted like this:
chrX 66766356 . TGGCGGCGGCGGC T
when I try to 'normalize' them using
bcftools norm and GATK
LeftAlignAndTrimVariants, these variants are not changed.
But, when I pass these variants through ANNOVAR, the output looks like this:
chrX 66766357 . GGCGGCGGCGGC -
This is the preferred format for annotations. But it causes problems because I am now unable to merge values from the original .vcf back into the ANNOVAR output.
As per the comment on the
bcftools issue posted here, the ANNOVAR output format is "not a valid VCF record". So it seems that maybe variant normalization tools would not be appropriate for producing this output?
Any ideas on how to fix this workflow and get both the custom selected & recalculated fields from the original .vcf combined with the ANNOVAR output in these cases?