I'm currently updating my Variant Calling Pipeline by switching the VCF annotating software from Annovar to VEP for a variety of regions, not least how easy it is to annotate with HGVS notation and keep datasets up to date in VEP.
For the most part everything is running smoothly, with the exception that some of the data in the VCFs is lost during annotation (and conversion to tsv). The VCFs are created with GATK's UnifiedGenotyper and include a 'Format' column where each value is 'GT:AD:DP:GQ:PL' and a column named after the Individual, which contains semicolon-separated data that corresponds to the Format column (i.e. Genotype;Allele Depth;Depth;Genotype Quality;Phred-likelihood). When I annotate with VEP none of this data is carried over to the output file as it would be in Annovar, leaving me with an annotated file that has no information on read depth, genotype or any of the other data in the two lost columns.
I've included the command I'm currently using for annotation:
./vep -i RM0108.vcf --cache --force_overwrite --tab --merged --variant_class --sift b --polyphen b --hgvs --symbol --canonical --check_existing --af_1kg --af_gnomad --humdiv --pick -o RM0108.tsv
I can't find information on this in the VEP documentation or elsewhere online. I could write something to take the relevant information from the VCF and add it to the tsv after VEP has finished running, but it seems like there may be an easier solution that I'm missing, so any help would be appreciated.
I've also posted this question on the Bioinformatics StackOverflow Link Here
It would definitely be easier to extract and add the required information using
bcftools. If you'd like to preserve all VCF information, your output format should be
VCFis 3D (one mXn matrix per sample per variant) information, tsv is 2d (one data point per sample per variant).
You might also want to look at GATK's
You will need to use the
-GFflag for each genotype field you want output
What's mXn represents?
Read it as "m -by- n", which refers to a 2D matrix with