I am trying to annotate WGS VCF files through VEP, and even on multi-threading (n=8) the process is painfully slow. The VCF sizes range between 1-1.5Gb. Around 2 days for each VCF
What would you recommend to speed up the process? I can break the VCF into chunks. Any other sol. or software reco.? Apart from gene & functional impact level anno., I am using VEP plugins for CADD and gnomAD genome based freq.
There is a parameter
--buffer-sizethat controls how many variants are loaded into memory. Double-check with the manual, as this might substantially increase speed if you have the RAM available.
many thanks! Going to try this
Thanks again. With 2x the buffer size than default and n=16 threads the process completed in ~7hrs!
I see that you have used CADD plugin fro VEP. Could you please let me know how you did it. It does not work for me and gives me a blank column of CADD scores.
'does not work' is not a helpful error message. How did you install and run it?
I am trying to run by using "--plugin CADD, path/to/ InDels_inclAnno.tsv.gz,path/to/ whole_genome_SNVs_inclAnno.tsv.gz".But the column with CADD is blank in the output.
Hi, Have you been able to make it work? After installing VEP, you need to have run -
That would configure the plugins (you specify) under the vep data cache Plugins/ path. You would get msg like this -
Once done, I created a dir. -> .vep/Plugins/dat_CADD_1.3 and downloaded
InDels.tsv.gz(along with their index files). Then providing this arg. to VEP run -
does the job.