CPU/RAM resources for variant calling
0
1
Entering edit mode
8.1 years ago
Bogdan ★ 1.4k

Dear all,

we have just set up a cluster with 4 nodes (128GB and 32 CPUs per node). Please could you let me know what would be the optimal configuration (RAM/CPU) in order to run GATK/Mutect on a node or any other variant calling software (such as Strelka, Varscan, SomaticSnipper) ? many thanks,

-- bogdan

SNP next-gen snp • 3.0k views
ADD COMMENT
0
Entering edit mode

I can't give you any hard numbers, but I think you will be memory-bound before you are cpu-bound. I have 64Gb of RAM and I could run 4 instances of GATK before I maxed out (different components of the pipeline use more/less memory, but it seemed 4 was a good number for essentially every step). However that is with the default configuration running every step in parallel naively. There is plenty of space for optimization by changing parameters of GATK/Picard and using more sophisticated pipelining such that high-memory jobs are run with low-memory jobs.

Also, do not neglect the amount of disk-space you'll need! Other users of Biostars have commented that GATK uses a ton of space in temp files, but this can be overcome by diverging from the best practices and piping things together better. Furthermore, recent versions of the HaplotypeCaller include some element of BQSR and IndelRealignment built-in, so those separate steps can maybe be skipped without much of a difference to the final SNP calling. I haven't done either of those things, but it suggests that you will probably start with a pipeline that runs 8-9 jobs in parallel, and you will be able to tune things to bring that number up and maximize your resources as you learn more about your data and how these tools work.

ADD REPLY
1
Entering edit mode

thanks a lot John fro sharing your experience with GATK !

ADD REPLY

Login before adding your answer.

Traffic: 1440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6