What memory usage and run time for VEP whole genome variant annotation? I tried annotate a 5 sample Illumina 30x coverage whole genome VCF:
perl /ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl --force_overwrite -i G85829.vcf --cache --assembly GRCh37 --offline --individual all \
-o output \
this command runs out of memory after ~11 hours. there is about 20Gb free memory on unubtu server:
[==============================================================================] [ 100% ]
2015-03-05 22:14:40 - Processed 20675000 total variants (238 vars/sec, 547 vars/sec total)
2015-03-05 22:14:41 - Read 5000 variants into buffer
2015-03-05 22:14:41 - Reading transcript data from cache and/or database
[=====================================> ] [ 50% ]ERROR: Cannot allocate memory at /ensembl-tools-release-78/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm line 4735, <GEN0> line 4136138.
The processed total variants number is close to total variants from GATK calling (,each genome has about 3.5M total SNPs and .5 total indels, 20,917 total). So I wonder if something happen after all variants were processed. For comparison it takes few minutes for ANNOVAR to annotate one genome.