Question

CNVkit heatmap RAM and Runtime

0

Entering edit mode

16 months ago

Conor ▴ 20

Hello!

My WES pipeline written in CWL that uses CNVkit hasn't had any issues generating heatmap plots across all samples in the cohort when we were processing hundreds of samples, but we recently tried running it with ~3000 samples and ran into resource issues.

Context: our CWL pipeline runs in the cloud, requesting an AWS instance of 32GB of RAM for running cnvkit's heatmap script. Usually it completes in a few minutes, and for the largest cohort we ran previously (~300 samples) it completed in just under 30 minutes. Our profiling of the memory usage showed that it only uses a maximum of around 25% of the total allocated memory per run. The pipeline runs the command across all .cns files with no options as follows:

cnvkit.py heatmap \
sampleA.call.cns \
sampleB.call.cns \
... \
--output cnvkit_heatmap.pdf

We recently ran a cohort of 3000 samples, and found that CNVkit ran for 24 hours before the job was killed. We assumed it would take within that time limit based on how long it took to run for 300 samples, but now we're wondering if the amount of time it takes as number of samples increases isn't linear. We also profiled the memory usage, and found that it still only used ~40% of the total requested memory (again 32GB) over the 24 hour run.

Question: I wanted to ask the community here if any have experience running cnvkit heatmap on the order of thousands of samples: how long do these runs usually take in your experience, and how much memory is a good number to request for that number of samples? Should we expect runs to take longer than 24 hours when the number of samples gets this large, or is this an indication that there might be a different issue? We're trying to optimize the time and resource requests, so any advice and feedback would be greatly appreciated!

Thank you,
Conor O'Donoghue

aws cnvkit cnv • 918 views

ADD COMMENT • link updated 16 months ago by Ram 45k • written 16 months ago by Conor ▴ 20

0

Entering edit mode

This might not really be helpful as I can't directly answer the question, but would there be a possibility to try to run the CNVkit step locally on a machine to see how long it might run for? Since the memory requirements don't seem too bad (0.4*32 = ~13-14GB) it seems possible on a decent machine, though I don't know what the CPU resource might be like or if that could be a factor in the runtime as well.

ADD REPLY • link 16 months ago by DGTool ▴ 290

score 2 · Accepted Answer · 2024-07-03

I wanted to follow up on this in case anyone else has any similar use case. We let heatmap run with the same settings, and it ran for 140+ hours before hitting a memory error when trying to use more than 32GB of RAM. It seems over the course of the multi-day run it slowly starts to use more memory until it hits the limit. As it stands, it is likely not worth trying to run cnvkit's heatmap.py on this many samples.