Question

Snakemake - pipeline shut down without error

0

Entering edit mode

9 months ago

bhumm ▴ 200

I have been running a relatively simple snakemake pipeline that processes bam files and aggregates a variety of metrics. When running it progresses as expected then randomly shuts down. What the stdout/log reports:

...
16 of 39 steps (41%) done
[Wed Oct  9 13:48:45 2024]
Finished job 7.
17 of 39 steps (44%) done
[Wed Oct  9 13:48:45 2024]
Finished job 34.
18 of 39 steps (46%) done
[Wed Oct  9 13:48:56 2024]
Finished job 55.
19 of 39 steps (49%) done
[Wed Oct  9 13:50:06 2024]
Finished job 25.
20 of 39 steps (51%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-09T134300.644136.snakemake.log

When searching through the log there is no added information, errors, or tracebacks. If I rerun the command (with the --rerun-incomplete flag) it picks up at the exact job it quits at and ends up completing successfully. I have adequate CPUs(70+) and RAM(500+GB) so I can't imagine its a resource issue. I have not been able to find any other information online about this.

Any advice or ideas is appreciated!

snakemake • 1.1k views

ADD COMMENT • link 11 weeks ago by bhumm ▴ 200

0

Entering edit mode

Did the dry run end correctly? Could you show us the rule that fails?

ADD REPLY • link 9 months ago by Shred ★ 1.6k

0

Entering edit mode

I will give a dry run a go and report back. That is the strange part - there are no specific rules that fail, the pipeline just quits(usually around 45-55% complete). If I rerun the command it picks back up and will finish as expected.

ADD REPLY • link 9 months ago by bhumm ▴ 200

score 1 · Accepted Answer · 2025-04-23

Circling back to answer my own question. After testing the --dry-run (as @Shred suggested) it completed as expected. The rule that consistently failed was an aggregation step that opened and processed a substantial number of large parquet files. Despite the compute resources I had available and designating the use of all cores (--cores all) I was actually not allocating appropriate resources to the more computationally burdensome rules.

After updating the threads and mem_mb directives within the rule, my pipeline no longer crashes.

It is still rather unsatisfying that I could not capture any stdout that confirms the crash due to lack of resources, I suspect this is the issue.

TLDR; when running a computationally intensive or memory heavy step, explicitly allocate memory and/or threads to the related rule.