Question: memory usage and run time for VEP whole genome variant annotation?
0
gravatar for vlad1
4.6 years ago by
vlad10
United States
vlad10 wrote:

Hi,

 

What memory usage and run time for VEP whole genome variant annotation? I tried annotate a 5 sample Illumina 30x coverage whole genome VCF:

 perl /ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl --force_overwrite -i G85829.vcf --cache --assembly GRCh37 --offline --individual all \
    --symbol \
    --numbers \
    --biotype \
    --total_length \
    -o output \
    --vcf \
   --fields Consequence,Codons,Amino_acids,Gene,SYMBOL,Feature,EXON,Protein_position,BIOTYPE 

this command runs out of memory after ~11 hours. there is about 20Gb free memory on unubtu server:

 

[==============================================================================]  [ 100% ]
2015-03-05 22:14:40 - Processed 20675000 total variants (238 vars/sec, 547 vars/sec total)
2015-03-05 22:14:41 - Read 5000 variants into buffer
2015-03-05 22:14:41 - Reading transcript data from cache and/or database
[=====================================>                                        ]   [ 50% ]ERROR: Cannot allocate memory at /ensembl-tools-release-78/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm line 4735, <GEN0> line 4136138.

 

The processed total variants number is close to total variants from GATK calling (,each genome has about 3.5M total SNPs and .5 total indels, 20,917 total). So I wonder if something happen after  all variants were processed. For comparison it takes few minutes for ANNOVAR to annotate one genome. 

 

Vlad

 

 

snp vep ensembl genome • 3.2k views
ADD COMMENTlink modified 4.6 years ago by EnsemblWill560 • written 4.6 years ago by vlad10
4
gravatar for EnsemblWill
4.6 years ago by
EnsemblWill560
United Kingdom
EnsemblWill560 wrote:

Try using --fork (see http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#forking)

Not only should this eradicate any memory leak issues, but you should see the script run much, much faster.

In addition, if you can wait a week or so, the next release of VEP (79) is even faster still than 78 and comes with a handy guide for making sure your VEP analyses are running at optimal speed.

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by EnsemblWill560

Thanks! it worked. greatly reduced memory footprint and took about 10 hours to process 42M variants on 8 cpus. I used --fork 6:

--force_overwrite -i G85829.vcf --cache --assembly GRCh37 --offline --individual all --fork 6 --sift b --polyphen b --symbol --numbers --biotype --total_length -o output --vcf --fields Consequence,Codons,Amino_acids,Gene,SYMBOL,Feature,EXON,PolyPhen,SIFT,Protein_position,BIOTYPE

probably can be further optimized via the batch size option

 

Vlad

 

 

ADD REPLYlink written 4.6 years ago by vlad10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 768 users visited in the last hour