Question: faster variant annotation for large VCFs
0
gravatar for Amitm
12 months ago by
Amitm1.6k
UK
Amitm1.6k wrote:

hi there,

I am trying to annotate WGS VCF files through VEP, and even on multi-threading (n=8) the process is painfully slow. The VCF sizes range between 1-1.5Gb. Around 2 days for each VCF

What would you recommend to speed up the process? I can break the VCF into chunks. Any other sol. or software reco.? Apart from gene & functional impact level anno., I am using VEP plugins for CADD and gnomAD genome based freq.

Thanks

variant annotation wgs vcf • 691 views
ADD COMMENTlink modified 12 months ago by Emily_Ensembl19k • written 12 months ago by Amitm1.6k
1

There is a parameter --buffer-size that controls how many variants are loaded into memory. Double-check with the manual, as this might substantially increase speed if you have the RAM available.

ADD REPLYlink written 12 months ago by ATpoint22k

many thanks! Going to try this

ADD REPLYlink written 12 months ago by Amitm1.6k

Thanks again. With 2x the buffer size than default and n=16 threads the process completed in ~7hrs!

ADD REPLYlink written 11 months ago by Amitm1.6k

Hi,

I see that you have used CADD plugin fro VEP. Could you please let me know how you did it. It does not work for me and gives me a blank column of CADD scores.

Thank You

ADD REPLYlink written 12 months ago by PC20
1

'does not work' is not a helpful error message. How did you install and run it?

ADD REPLYlink written 12 months ago by ATpoint22k

I am trying to run by using "--plugin CADD, path/to/ InDels_inclAnno.tsv.gz,path/to/ whole_genome_SNVs_inclAnno.tsv.gz".But the column with CADD is blank in the output.

Thanks

ADD REPLYlink written 12 months ago by PC20

Hi, Have you been able to make it work? After installing VEP, you need to have run -

perl \
INSTALL.pl \
--AUTO p \
--PLUGINS CADD,ExAC \
--SPECIES homo_sapiens_merged \
--ASSEMBLY GRCh37

That would configure the plugins (you specify) under the vep data cache Plugins/ path. You would get msg like this -

 - installing "CADD"
 - This plugin requires data
 - See /Users/akmandal/.vep/Plugins/CADD.pm for details
 - OK

Once done, I created a dir. -> .vep/Plugins/dat_CADD_1.3 and downloaded whole_genome_SNVs.tsv.gz & InDels.tsv.gz (along with their index files). Then providing this arg. to VEP run -

--plugin CADD,/path/to/dat_CADD_1.3/whole_genome_SNVs.tsv.gz,/path/to/dat_CADD_1.3/InDels.tsv.gz \

does the job.

ADD REPLYlink written 12 months ago by Amitm1.6k
1
gravatar for Emily_Ensembl
12 months ago by
Emily_Ensembl19k
EMBL-EBI
Emily_Ensembl19k wrote:

There's some information about VEP speed in this blog post. Your plugins are probably slowing you down a bit as they have to communicate with databases rather than using your offline cache, which can be slower – I'm afraid there's not a lot you can do about that.

ADD COMMENTlink written 12 months ago by Emily_Ensembl19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1684 users visited in the last hour