Question: The VEP output for the variants of whole genome sequencing
0
gravatar for seta
10 months ago by
seta1.4k
Sweden
seta1.4k wrote:

Dear all,

I'm going to annotate a large number of variants (about 50 millions) derived from the whole genome sequencing of a given population. For getting output, as you know, we can specify "Pick once line or block of consequence data per variant" or "Pick once line or block of consequence data per variant allele" as explained at here. Could you let me know which one should be selected? Also, please kindly let me know any your experience or comments to reduce the running time.

Thanks

P.S. Regrading the speed, Emily from Ensembl kindly suggested me to use the buffer size of 5000 and 4 fork depend on the system. I'm looking for other experiences that you may obtain during your work.

ADD COMMENTlink modified 10 months ago • written 10 months ago by seta1.4k
1

For other people who might offer to help, note that I have already pointed seta to the options to speed up the VEP page. We have also talked about what to set the forks and buffer size to – I advised her that best option to set forks to is usually 4 and that the buffer is 5000 by default, but what will work best for her depends on the cores/memory/system she has available and she should do a bit of testing with a smaller file.

Seta: if you have already received help and advice on something, it is generally useful to re-state that here, so that other people do not just give you exactly the same advice.

ADD REPLYlink modified 10 months ago • written 10 months ago by Emily_Ensembl21k

Reducing run time can be done by forking (see manual) and splitting (and later re-joining) the VCFs into chunks and then run VEP in parallel on them. This obviously requires more computational resources.

ADD REPLYlink written 10 months ago by ATpoint40k
0
gravatar for Emily_Ensembl
10 months ago by
Emily_Ensembl21k
EMBL-EBI
Emily_Ensembl21k wrote:

Do you have multi-allele variants? If so, pick per allele. If not, pick per variant.

ADD COMMENTlink written 10 months ago by Emily_Ensembl21k

Thanks Emily, I edited the post. The multi-allelic variants already spilt, so I should use the pick per variant.

ADD REPLYlink modified 10 months ago • written 10 months ago by seta1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1756 users visited in the last hour