How much RAM / how many CPUs should I allocate for Mutect2?
1
0
Entering edit mode
3.9 years ago
jrleary ▴ 210

I'm running Mutect2 on some WES data. The .bam file is 4.7G, and I'm comparing it against the hg38 reference genome. I allocated 8 CPUs and 90G of memory using slurm, but progress has been very slow. If I wanted the job to complete for single sample within ~24 hours, what sort of CPU and memory allocation should I be using?

WES • 3.0k views
ADD COMMENT
2
Entering edit mode

If there are parts of this pipeline that are single threaded there is not much you can do to speed things up.

ADD REPLY
0
Entering edit mode

According to this post on the GATK forums, Mutect2 does not support multithreading. With 100G of RAM, it took 4.35 hours to process the first chromosome. This is using gatk v4.1.2, which supposedly has "significant speed improvements."

ADD REPLY
0
Entering edit mode
3.9 years ago
steve ★ 3.5k

MuTect2 has historically not run multi-threaded, in fact you are discouraged from enabling multithread options with it (some GATK engine multithread options were still accessible but did not work). Yes, it is excruciatingly slow to run. For memory, I only ever used 12GB for target exome sequencing samples, so you might start around there and increase as needed. Since its running single threaded, you should not need more than 1 CPU allocated.

The best way to speed up MuTect2 is to instead run multiple instances of it at once with the --intervals option to supply a .bed file of genomic regions for it to analyze. In this way, you can break up a target list of ~10,000 regions into 100 region chunks to be run in parallel (example here, script here). This will give you a massive speed increase, but you will likely want to use some kind of pipeline orchestration framework to manage this since it will result in a huge number of cluster jobs, and then a huge number of resulting .vcf files that need to be processed and merged afterwards. That is the technique that I used in my workflow here; https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/3ba2f970c3fbee56080ba60727f7bf43cb1be3b2/main.nf#L2301-L2359

ADD COMMENT

Login before adding your answer.

Traffic: 1487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6