Running SnpSift in parallel
Entering edit mode
2.7 years ago


I am trying to using SnpSift to calculate case vs control groups. The file I am using is quite large and the first time I ran SnpSift on the file took quite a few days to finish. I am in a bit of a time crunch and it is unclear if SnpSift will finish calculating the case vs control groups before I need the data. I was looking at the SnpSift documentation and it doesn't look like there is a way to speed things up with multi-threading. I realize that SnpSift is having to do a calculation for each line of the file, which just takes time.

However, I was wondering if you can split the annotated vcf file that I created using Snpeff into smaller files. So for example, if my starting annotated vcf is 1 terabyte I could split that into ten 100 gigabytes files. What I could do from there is run SnpSift on each of the 10 files in parallel and then merge all of them when they're done running? I admit this is not an ideal situation but I am not sure what else to do.

I was wondering if there are any flaws with this plan? Or if anyone has any other solutions? I know there will be some formatting issues that I will have to deal with.

VCF WGS Annotation • 825 views
Entering edit mode

use a workflow manager,split per chromosome or region , run each region in parallel, merge each region.


Login before adding your answer.

Traffic: 2006 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6