VEP JSON output parsing
Entering edit mode
5.7 years ago
conorproud89 ▴ 20

I'm writing a python parser for VEP produced serialised JSON output, but finding the alleles recorded in the JSON differ from the vcf, usually by one base (i.e. lacking a reference base), but for "complex" variants this is more of a problem as sometimes the allele recorded in the JSON is the same as that in the vcf, and sometimes it is one shorter. Has anyone come across this before and has an explanation as to why this happens?

json variant effect predictor python • 2.1k views
Entering edit mode

On the off-chance, I too am working on a parser. I was wondering if you could link me to yours and that may probably help me out? Thanks in advance!

Entering edit mode
5.7 years ago
EnsemblWill ▴ 560

VEP converts unbalanced substitutions (e.g. insertions, deletions) to an internal standard Ensembl representation. This is explained in basic terms here; essentially the leading base is trimmed from the REF and ALT alleles.

However, this only explains what happens in the simple case, i.e. with only one ALT allele. For complex VCF entries, VEP will only trim the leading base if all REF and ALTs share the same leading base.

A useful way to keep track of which allele is which in the output is to use the --allele_number flag.

You may modify VEP's default behaviour to reduce all REF/ALT pairs to their minimal representation using --minimal.

Both the --allele_number and --minimal flags may also be used as parameters if you're using the VEP REST API (i.e. add "&allele_number=1&minimal=1" to your URL or the equivalent parameters to the POST body.


Login before adding your answer.

Traffic: 1434 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6