Keep Format and Individual fields when annotating VCF with VEP
2
0
Entering edit mode
4.9 years ago

I'm currently updating my Variant Calling Pipeline by switching the VCF annotating software from Annovar to VEP for a variety of regions, not least how easy it is to annotate with HGVS notation and keep datasets up to date in VEP.

For the most part everything is running smoothly, with the exception that some of the data in the VCFs is lost during annotation (and conversion to tsv). The VCFs are created with GATK's UnifiedGenotyper and include a 'Format' column where each value is 'GT:AD:DP:GQ:PL' and a column named after the Individual, which contains semicolon-separated data that corresponds to the Format column (i.e. Genotype;Allele Depth;Depth;Genotype Quality;Phred-likelihood). When I annotate with VEP none of this data is carried over to the output file as it would be in Annovar, leaving me with an annotated file that has no information on read depth, genotype or any of the other data in the two lost columns.

I've included the command I'm currently using for annotation:

./vep -i RM0108.vcf --cache --force_overwrite --tab --merged --variant_class --sift b --polyphen b --hgvs --symbol --canonical --check_existing --af_1kg --af_gnomad --humdiv --pick -o RM0108.tsv


I can't find information on this in the VEP documentation or elsewhere online. I could write something to take the relevant information from the VCF and add it to the tsv after VEP has finished running, but it seems like there may be an easier solution that I'm missing, so any help would be appreciated.

I've also posted this question on the Bioinformatics StackOverflow Link Here

vcf annotation VEP variant-calling • 3.7k views
0
Entering edit mode

It would definitely be easier to extract and add the required information using bcftools. If you'd like to preserve all VCF information, your output format should be VCF not tab. VCF is 3D (one mXn matrix per sample per variant) information, tsv is 2d (one data point per sample per variant).

1
Entering edit mode

You might also want to look at GATK's VariantsToTable tool: https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_VariantsToTable.php

You will need to use the -GF flag for each genotype field you want output

0
Entering edit mode

What's mXn represents?

0
Entering edit mode

Read it as "m -by- n", which refers to a 2D matrix with m rows and n columns.

6
Entering edit mode
4.9 years ago
Emily 23k

The VEP TSV format can only keep in its own specified columns. If you want to maintain the data from your original input, get your output in VCF. It will add the VEP annotation to the INFO column, and keep all the stuff you already have there.

1
Entering edit mode

Is there a way to convert the VEP annotated vcf to tab?