Question: VEP JSON output parsing
gravatar for conorproud89
3.0 years ago by
conorproud8910 wrote:

I'm writing a python parser for VEP produced serialised JSON output, but finding the alleles recorded in the JSON differ from the vcf, usually by one base (i.e. lacking a reference base), but for "complex" variants this is more of a problem as sometimes the allele recorded in the JSON is the same as that in the vcf, and sometimes it is one shorter. Has anyone come across this before and has an explanation as to why this happens?

ADD COMMENTlink modified 3.0 years ago by EnsemblWill560 • written 3.0 years ago by conorproud8910

On the off-chance, I too am working on a parser. I was wondering if you could link me to yours and that may probably help me out? Thanks in advance!

ADD REPLYlink written 20 months ago by Arko30
gravatar for EnsemblWill
3.0 years ago by
United Kingdom
EnsemblWill560 wrote:

VEP converts unbalanced substitutions (e.g. insertions, deletions) to an internal standard Ensembl representation. This is explained in basic terms here; essentially the leading base is trimmed from the REF and ALT alleles.

However, this only explains what happens in the simple case, i.e. with only one ALT allele. For complex VCF entries, VEP will only trim the leading base if all REF and ALTs share the same leading base.

A useful way to keep track of which allele is which in the output is to use the --allele_number flag.

You may modify VEP's default behaviour to reduce all REF/ALT pairs to their minimal representation using --minimal.

Both the --allele_number and --minimal flags may also be used as parameters if you're using the VEP REST API (i.e. add "&allele_number=1&minimal=1" to your URL or the equivalent parameters to the POST body.

ADD COMMENTlink written 3.0 years ago by EnsemblWill560
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1190 users visited in the last hour