Question: How can I make VEP recognize header lines in my VCF?
0
gravatar for Chris Miller
19 days ago by
Chris Miller18k
Washington University in St. Louis, MO
Chris Miller18k wrote:

Here's a massively simplified VCF file with one line:

##fileformat=VCFv4.1
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  M_CJ-R878H_AML1-R878H_AML1
1       3329234 .       G       T       36.8    .       TIER=1  GT     1/1

If I run VEP on it like so, it returns an warning, as it tries to parse the FORMAT header line:

$ perl /path/to/ensembl-tools-release-86/scripts/variant_effect_predictor/variant_effect_predictor.pl -i test.vcf -o out.vcf --offline --cache_version 67 --species mus_musculus --vcf --symbol --format vcf --dir_cache /path/to/.vep --dir_plugins /path/to/VEP_plugins-release-86
2017-05-04 12:46:31 - Read existing cache info
2017-05-04 12:46:31 - Starting...

WARNING: Invalid input formatting on line 2
2017-05-04 12:46:31 - Read 1 variants into buffer
2017-05-04 12:46:31 - Reading transcript data from cache and/or database
[========================================================================================================================]  [ 100% ]
2017-05-04 12:46:31 - Retrieved 8 transcripts (0 mem, 8 cached, 0 DB, 0 duplicates)
2017-05-04 12:46:31 - Analyzing chromosome 1
2017-05-04 12:46:31 - Analyzing variants
[========================================================================================================================]  [ 100% ]
2017-05-04 12:46:31 - Calculating consequences
2017-05-04 12:46:31 - Processed 1 total variants (1 vars/sec, 1 vars/sec total)
2017-05-04 12:46:31 - Wrote stats summary to out.vcf_summary.html
2017-05-04 12:46:31 - See out.vcf_warnings.txt for details of 1 warnings
2017-05-04 12:46:31 - Finished!

To support my idea that it's not handling the header correctly, if I run this VCF omitting the --format vcf flag, it is unable to detect that it is a VCF.

It does return the annotated VCF lines correctly when told that it's a VCF, but doesn't pass through the existing header lines and also doesn't add the CSQ header line that contains the key for parsing the information the VEP adds.

Has anyone encountered this before? Any suggestions on how to make VEP do the right thing here?


Edit to add output, which is sane, but lacking the expected headers:

##fileformat=VCF
1   3329234 .   G   T   36.8    .   GT;CSQ=T|intron_variant|MODIFIER||ENSMUSG00000051951|Transcript|ENSMUST00000070533|protein_coding||2/2||||||||||-1|||   1/1
vep annotation vcf • 113 views
ADD COMMENTlink modified 19 days ago • written 19 days ago by Chris Miller18k
1
gravatar for Chris Miller
19 days ago by
Chris Miller18k
Washington University in St. Louis, MO
Chris Miller18k wrote:

Update - this doesn't seem to happen on my laptop's more recent install of VEP (version 87 vs version 86). I guess it's either a version issue or a somehow screwy install. I'm going to go ahead and mark this as the best answer for now, as an upgrade seems like it will solve the problem.

If anyone has additional information or has encountered this, would still love to hear what might be wrong.

ADD COMMENTlink written 19 days ago by Chris Miller18k
0
gravatar for Pierre Lindenbaum
19 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum93k wrote:

use awk to insert a dummy format for each token in the FORMAT column

something like

awk '/^#CHROM/ {prinftf("##FORMAT=<ID=x1,Number=1,Type=String,Description=\"\">\n##FORMAT=<ID=x2,Number=1,Type=String,Description=\"\">\n");} {print;}' in.vcf
ADD COMMENTlink written 19 days ago by Pierre Lindenbaum93k

Sadly, that's not the issue. I'm editing the post above to make the VCF even simpler and make the FORMAT lines match up 100% with the fields - the same warning and header recognition issue persists.

ADD REPLYlink modified 19 days ago • written 19 days ago by Chris Miller18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 988 users visited in the last hour