VEP JSON output parsing
1
0
Entering edit mode
6.8 years ago
conorproud89 ▴ 20

I'm writing a python parser for VEP produced serialised JSON output, but finding the alleles recorded in the JSON differ from the vcf, usually by one base (i.e. lacking a reference base), but for "complex" variants this is more of a problem as sometimes the allele recorded in the JSON is the same as that in the vcf, and sometimes it is one shorter. Has anyone come across this before and has an explanation as to why this happens?

json variant effect predictor python • 2.4k views
ADD COMMENT
0
Entering edit mode

On the off-chance, I too am working on a parser. I was wondering if you could link me to yours and that may probably help me out? Thanks in advance!

ADD REPLY
5
Entering edit mode
6.8 years ago
EnsemblWill ▴ 570

VEP converts unbalanced substitutions (e.g. insertions, deletions) to an internal standard Ensembl representation. This is explained in basic terms here; essentially the leading base is trimmed from the REF and ALT alleles.

However, this only explains what happens in the simple case, i.e. with only one ALT allele. For complex VCF entries, VEP will only trim the leading base if all REF and ALTs share the same leading base.

A useful way to keep track of which allele is which in the output is to use the --allele_number flag.

You may modify VEP's default behaviour to reduce all REF/ALT pairs to their minimal representation using --minimal.

Both the --allele_number and --minimal flags may also be used as parameters if you're using the VEP REST API (i.e. add "&allele_number=1&minimal=1" to your URL or the equivalent parameters to the POST body.

ADD COMMENT

Login before adding your answer.

Traffic: 3156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6