Question

VEP - How to retrieve information from the INFO column of a vcf file neatly

0

Entering edit mode

2.8 years ago

K.patel5 ▴ 150

Dear Biostars,

I am trying to add annotations to a .vcf file. I have created a .tab file and a .vcf file using code included in this post. However, the .vcf file stores the extra information requested in the INFO column, while the .tab file is more neatly organised. I would rather stick the a .vcf file format because this seems to be the more popular annotations format. Any help to clear this misunderstanding up would be helpful.

tab file creation code and first line of output:

${vep_path}/vep --cache --dir $dir \
--dir_cache $dir_cache --offline --species homo_sapiens --assembly GRCh38 --fasta ${fasta} \
--input_file  ${input.vcf} --output_file output.vcf --warning_file warn.txt --stats_file stat.html \
--hgvs --symbol --force_overwrite --format vcf --tab --no_check_variants_order 
--check_existing --polyphen p --sift p --af_gnomad --total_length --max_af --variant_class \
--keep_csq --plugin CADD --plugin dbNSFP --plugin ExACpL --plugin LoFtool \
--plugin DisGeNET --plugin REVEL --plugin Mastermind \
--fields "Uploaded_variation,Location,Allele,Gene,Feature,SYMBOL,EXON,Existing_variation,VARIANT_CLASS,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,HGVSc,HGVSp,BIOTYPE,IMPACT,CLIN_SIG,PolyPhen,SIFT,CADD_PHRED,CADD_RAW,MutationTaster_pred,REVEL,gnomAD_AF,MAX_AF,ExACpLI,LoFtool,DisGeNET_PMID,DisGeNET_SCORE,DisGeNET_disease,Mastermind_URL" \
--pick --pick_order rank,canonical,tsl --fork 4 --buffer_size 20000

And I will just give the first several columns for neatness.

Uploaded_variation|Location|Allele|Gene|Feature|SYMBOL|EXON|Existing_variation
chr1_14653_C/T|chr1:14653|T|ENSG00000227232|ENST00000488147|WASH7P|-|rs62635297

However, when I try to create a vcf file with almost the same code, the requested information is all crammed into the INFO column (highlighted in bold to make it easier to view).

The only two flags changed are:

--vcf instead of --tab
--fields "Allele,Gene,Feature,SYMBOL,EXON,Existing_variation,VARIANT_CLASS,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,HGVSc,HGVSp,BIOTYPE,IMPACT,CLIN_SIG,PolyPhen,SIFT,CADD_PHRED,CADD_RAW,MutationTaster_pred,REVEL,gnomAD_AF,MAX_AF,ExACpLI,LoFtool,DisGeNET_PMID,DisGeNET_SCORE,DisGeNET_disease,Mastermind_URL"

The output files columns are as follows:

CHROM|POS|ID|REF|ALT|QUAL|FILTER|INFO|FORMAT|subject1 chr1|14653|.|C||T|359.64|MQ40;SOR3;VQSRTrancheSNP99.90to100.00| AC=1;AF=0.500;AN=2;AS_FilterStatus=VQSRTrancheSNP99.90to100.00;AS_VQSLOD=-10.7082;AS_culprit=MQ;BaseQRankSum=-8.870e-01;DP=24;ExcessHet=3.0103;FS=8.016;MLEAC=1;MLEAF=0.500;MQ=23.02;MQRankSum=0.883;NEGATIVE_TRAIN_SITE;QD=18.93;ReadPosRankSum=2.26;SOR=4.863;CSQ=chr1_14653_C/T|chr1:14653|T|ENSG00000227232|ENST00000488147|WASH7P||rs62635297|SNV|intron_variant&non_coding_transcript_variant|||||ENST00000488147.1:n.1254-152G>A||unprocessed_pseudogene|MODIFIER||||0.148|-0.373269|||||||||| |GT:AD:DP:GQ:PL|0/1:3,16:19:25:367,0,25

Is there a way in VEP to extract this information from the INFO column and neatly organise this as a .vcf file?

tab vcf annotations VEP • 2.2k views

ADD COMMENT • link updated 2.7 years ago by zx8754 12k • written 2.8 years ago by K.patel5 ▴ 150

score 2 · Answer 1 · 2022-01-14

2

Entering edit mode

2.8 years ago

Pierre Lindenbaum 164k

have a look at bcftools +split-vep followed by bcftools query

ADD COMMENT • link 2.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Cheers, will have a look.

ADD REPLY • link 2.8 years ago by K.patel5 ▴ 150

score 0 · Answer 2 · 2022-01-19

0

Entering edit mode

2.8 years ago

nihilior ▴ 50

I found this useful: https://vatools.readthedocs.io/en/latest/vep_annotation_reporter.html

ADD COMMENT • link 2.8 years ago by nihilior ▴ 50