Question: Keep Format and Individual fields when annotating VCF with VEP
0
gravatar for jsneaththompson
7 months ago by
jsneaththompson60 wrote:

I'm currently updating my Variant Calling Pipeline by switching the VCF annotating software from Annovar to VEP for a variety of regions, not least how easy it is to annotate with HGVS notation and keep datasets up to date in VEP.

For the most part everything is running smoothly, with the exception that some of the data in the VCFs is lost during annotation (and conversion to tsv). The VCFs are created with GATK's UnifiedGenotyper and include a 'Format' column where each value is 'GT:AD:DP:GQ:PL' and a column named after the Individual, which contains semicolon-separated data that corresponds to the Format column (i.e. Genotype;Allele Depth;Depth;Genotype Quality;Phred-likelihood). When I annotate with VEP none of this data is carried over to the output file as it would be in Annovar, leaving me with an annotated file that has no information on read depth, genotype or any of the other data in the two lost columns.

I've included the command I'm currently using for annotation:

./vep -i RM0108.vcf --cache --force_overwrite --tab --merged --variant_class --sift b --polyphen b --hgvs --symbol --canonical --check_existing --af_1kg --af_gnomad --humdiv --pick -o RM0108.tsv

I can't find information on this in the VEP documentation or elsewhere online. I could write something to take the relevant information from the VCF and add it to the tsv after VEP has finished running, but it seems like there may be an easier solution that I'm missing, so any help would be appreciated.

I've also posted this question on the Bioinformatics StackOverflow Link Here

ADD COMMENTlink modified 3 months ago by MasMarius10 • written 7 months ago by jsneaththompson60

It would definitely be easier to extract and add the required information using bcftools. If you'd like to preserve all VCF information, your output format should be VCF not tab. VCF is 3D (one mXn matrix per sample per variant) information, tsv is 2d (one data point per sample per variant).

ADD REPLYlink modified 7 months ago • written 7 months ago by RamRS19k

You might also want to look at GATK's VariantsToTable tool: https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_VariantsToTable.php

You will need to use the -GF flag for each genotype field you want output

ADD REPLYlink written 7 months ago by steve1.8k
6
gravatar for Emily_Ensembl
7 months ago by
Emily_Ensembl16k
EMBL-EBI
Emily_Ensembl16k wrote:

The VEP TSV format can only keep in its own specified columns. If you want to maintain the data from your original input, get your output in VCF. It will add the VEP annotation to the INFO column, and keep all the stuff you already have there.

ADD COMMENTlink written 7 months ago by Emily_Ensembl16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1078 users visited in the last hour