Running SnpSift in parallel
0
0
Entering edit mode
2.6 years ago

Hi,

I am trying to using SnpSift to calculate case vs control groups. The file I am using is quite large and the first time I ran SnpSift on the file took quite a few days to finish. I am in a bit of a time crunch and it is unclear if SnpSift will finish calculating the case vs control groups before I need the data. I was looking at the SnpSift documentation and it doesn't look like there is a way to speed things up with multi-threading. I realize that SnpSift is having to do a calculation for each line of the file, which just takes time.

However, I was wondering if you can split the annotated vcf file that I created using Snpeff into smaller files. So for example, if my starting annotated vcf is 1 terabyte I could split that into ten 100 gigabytes files. What I could do from there is run SnpSift on each of the 10 files in parallel and then merge all of them when they're done running? I admit this is not an ideal situation but I am not sure what else to do.

I was wondering if there are any flaws with this plan? Or if anyone has any other solutions? I know there will be some formatting issues that I will have to deal with.

VCF WGS Annotation • 782 views
ADD COMMENT
0
Entering edit mode

use a workflow manager,split per chromosome or region , run each region in parallel, merge each region.

ADD REPLY

Login before adding your answer.

Traffic: 2918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6