VEP - How to retrieve information from the INFO column of a vcf file neatly
Entering edit mode
4 days ago
K.patel5 ▴ 40

Dear Biostars,

I am trying to add annotations to a .vcf file. I have created a .tab file and a .vcf file using code included in this post. However, the .vcf file stores the extra information requested in the INFO column, while the .tab file is more neatly organised. I would rather stick the a .vcf file format because this seems to be the more popular annotations format. Any help to clear this misunderstanding up would be helpful.

tab file creation code and first line of output:

${vep_path}/vep --cache --dir $dir \
--dir_cache $dir_cache --offline --species homo_sapiens --assembly GRCh38 --fasta ${fasta} \
--input_file  ${input.vcf} --output_file output.vcf --warning_file warn.txt --stats_file stat.html \
--hgvs --symbol --force_overwrite --format vcf --tab --no_check_variants_order 
--check_existing --polyphen p --sift p --af_gnomad --total_length --max_af --variant_class \
--keep_csq --plugin CADD --plugin dbNSFP --plugin ExACpL --plugin LoFtool \
--plugin DisGeNET --plugin REVEL --plugin Mastermind \
--fields "Uploaded_variation,Location,Allele,Gene,Feature,SYMBOL,EXON,Existing_variation,VARIANT_CLASS,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,HGVSc,HGVSp,BIOTYPE,IMPACT,CLIN_SIG,PolyPhen,SIFT,CADD_PHRED,CADD_RAW,MutationTaster_pred,REVEL,gnomAD_AF,MAX_AF,ExACpLI,LoFtool,DisGeNET_PMID,DisGeNET_SCORE,DisGeNET_disease,Mastermind_URL" \
--pick --pick_order rank,canonical,tsl --fork 4 --buffer_size 20000

And I will just give the first several columns for neatness.

Uploaded_variation|Location|Allele|Gene|Feature|SYMBOL|EXON|Existing_variation chr1_14653_C/T|chr1:14653|T|ENSG00000227232|ENST00000488147|WASH7P|-|rs62635297

However, when I try to create a vcf file with almost the same code, the requested information is all crammed into the INFO column (highlighted in bold to make it easier to view).

The only two flags changed are:

  1. --vcf instead of --tab
  2. --fields "Allele,Gene,Feature,SYMBOL,EXON,Existing_variation,VARIANT_CLASS,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,HGVSc,HGVSp,BIOTYPE,IMPACT,CLIN_SIG,PolyPhen,SIFT,CADD_PHRED,CADD_RAW,MutationTaster_pred,REVEL,gnomAD_AF,MAX_AF,ExACpLI,LoFtool,DisGeNET_PMID,DisGeNET_SCORE,DisGeNET_disease,Mastermind_URL"

The output files columns are as follows:

CHROM|POS|ID|REF|ALT|QUAL|FILTER|INFO|FORMAT|subject1 chr1|14653|.|C||T|359.64|MQ40;SOR3;VQSRTrancheSNP99.90to100.00| AC=1;AF=0.500;AN=2;AS_FilterStatus=VQSRTrancheSNP99.90to100.00;AS_VQSLOD=-10.7082;AS_culprit=MQ;BaseQRankSum=-8.870e-01;DP=24;ExcessHet=3.0103;FS=8.016;MLEAC=1;MLEAF=0.500;MQ=23.02;MQRankSum=0.883;NEGATIVE_TRAIN_SITE;QD=18.93;ReadPosRankSum=2.26;SOR=4.863;CSQ=chr1_14653_C/T|chr1:14653|T|ENSG00000227232|ENST00000488147|WASH7P||rs62635297|SNV|intron_variant&non_coding_transcript_variant|||||ENST00000488147.1:n.1254-152G>A||unprocessed_pseudogene|MODIFIER||||0.148|-0.373269|||||||||| |GT:AD:DP:GQ:PL|0/1:3,16:19:25:367,0,25

Is there a way in VEP to extract this information from the INFO column and neatly organise this as a .vcf file?

tab vcf annotations VEP • 113 views
Entering edit mode
4 days ago

have a look at bcftools +split-vep followed by bcftools query

Entering edit mode

Cheers, will have a look.


Login before adding your answer.

Traffic: 2286 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6