Question: Ensembl variant effect predictor fails if REF or ALT allele has a length around 0.5 million
1
gravatar for mo.imranshah
3 months ago by
mo.imranshah10
mo.imranshah10 wrote:

I have been using VEP (variant effect predictor) from Ensembl for annotating VCFs produced by GATK's haplotype caller and PINDEL. The VEP is failing for some of the VCFs with the following error:

> -------------------- EXCEPTION --------------------
MSG: 
ERROR: Forked process(es) died: read-through of cross-process communication detected

>STACK Bio::EnsEMBL::VEP::Runner::_forked_buffer_to_output vep/version95/modules/Bio/EnsEMBL/VEP/Runner.pm:554
STACK Bio::EnsEMBL::VEP::Runner::next_output_line vep/version95/modules/Bio/EnsEMBL/VEP/Runner.pm:360
STACK Bio::EnsEMBL::VEP::Runner::run vep/version95/modules/Bio/EnsEMBL/VEP/Runner.pm:202
STACK toplevel vep/version95/vep:225
Date (localtime)    = Thu May  9 13:25:54 2019
Ensembl API version = 95 
---------------------------------------------------

It took me weeks to rectify the actual cause of this error as I was not able to find the solution on forums. I have tried adjusting the --buffer and --forks parameters as suggested on several forums but no success. It turns out to be an issue of REF and ALT alleles size for some variant. When I excluded the records with ALT/REF alleles' length more than 1000, I have got the results without any error.

VEP offline command used is:

vep --buffer_size 1000 --offline -i dataset_22336.dat -o dataset_22337.dat --cache --dir vep/database/ --force_overwrite --merged --cache_version 95 --assembly GRCh38 --fasta Homo_sapiens.GRCh38.dna.primary_assembly.fa --fork 32 --everything --vcf

What could be a possible solution to run VEP on the records with ALT/REF alleles' length in 0.5 to 2 million? Any help would be much appreciated.

Thanks in advance. Tagging @ Emily_Ensembl

vcf annotation vep ensembl vcf • 201 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by mo.imranshah10
1

do you really want to annotate a variant with this length ?

ADD REPLYlink modified 3 months ago • written 3 months ago by Pierre Lindenbaum122k

Pierre Lindenbaum, Could you please suggest what would be the optimal length to go with and exclude insignificant variants.

Thanks.

ADD REPLYlink modified 3 months ago • written 3 months ago by mo.imranshah10
1
gravatar for Ben_Ensembl
3 months ago by
Ben_Ensembl1.0k
EMBL-EBI
Ben_Ensembl1.0k wrote:

Hi mo.imranshah,

There are difficulties in handling long allele strings (>1000bp) for variants in VEP when fetching everything that overlaps the allele string and probably this was what lead to the fork failing.

We plan to look more into it to figure out exactly what would be an 'upper limit' and how to handle these cases better.

However, it may be more efficient to upload your data into the Ensembl browser to visualise the genomic regions of interest: http://www.ensembl.org/info/website/upload/index.html

or to use BioMart to retrieve the list of genes in the genomic regions of interest: http://www.ensembl.org/biomart/martview/8c4102e5d689e604e174715a45c6f340

Best wishes

Ben Ensembl Helpdesk

ADD COMMENTlink written 3 months ago by Ben_Ensembl1.0k

Thanks Ben for a quick reply. Hope to see a better performance of VEP in such cases. Meanwhile, I would go with your suggestions.

Best, Imran

ADD REPLYlink modified 3 months ago • written 3 months ago by mo.imranshah10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1657 users visited in the last hour