Question

The VEP output for the variants of whole genome sequencing

0

Entering edit mode

4.4 years ago

seta ★ 1.9k

Dear all,

I'm going to annotate a large number of variants (about 50 millions) derived from the whole genome sequencing of a given population. For getting output, as you know, we can specify "Pick once line or block of consequence data per variant" or "Pick once line or block of consequence data per variant allele" as explained at here. Could you let me know which one should be selected? Also, please kindly let me know any your experience or comments to reduce the running time.

Thanks

P.S. Regrading the speed, Emily from Ensembl kindly suggested me to use the buffer size of 5000 and 4 fork depend on the system. I'm looking for other experiences that you may obtain during your work.

VEP ensembl whole genome sequencing • 882 views

ADD COMMENT • link 4.4 years ago by seta ★ 1.9k

1

Entering edit mode

For other people who might offer to help, note that I have already pointed seta to the options to speed up the VEP page. We have also talked about what to set the forks and buffer size to – I advised her that best option to set forks to is usually 4 and that the buffer is 5000 by default, but what will work best for her depends on the cores/memory/system she has available and she should do a bit of testing with a smaller file.

Seta: if you have already received help and advice on something, it is generally useful to re-state that here, so that other people do not just give you exactly the same advice.

ADD REPLY • link 4.4 years ago by Emily 23k

0

Entering edit mode

Reducing run time can be done by forking (see manual) and splitting (and later re-joining) the VCFs into chunks and then run VEP in parallel on them. This obviously requires more computational resources.

ADD REPLY • link 4.4 years ago by ATpoint 82k

score 0 · Answer 1 · 2019-12-18

0

Entering edit mode

4.4 years ago

Emily 23k

Do you have multi-allele variants? If so, pick per allele. If not, pick per variant.

ADD COMMENT • link 4.4 years ago by Emily 23k

0

Entering edit mode

Thanks Emily, I edited the post. The multi-allelic variants already spilt, so I should use the pick per variant.

ADD REPLY • link 4.4 years ago by seta ★ 1.9k