Snakemake and BBsplit.sh for filtrate human reads : memory management
1
1
Entering edit mode
2.2 years ago

Hi there!

I have a list of paired fastq. I want to filtrate human read from each of these pairs using BBmap tool. here is a functionnal snakemake rule i wrote:

#Removing human reads from fastq
rule clean_fastq:
message:
input:
unzip_fastq_R1 = rules.fastq_unzip.output.unzip_fastq_R1,
unzip_fastq_R2 = rules.fastq_unzip.output.unzip_fastq_R2,
BBMAP = rules.get_BBmap.output.BBMAP    ,
index = rules.create_index.output.index
output:
R1_cleaned = result_repository + "FASTQ_CLEANED/{sample}_R1_cleaned.fastq",
R2_cleaned = result_repository + "FASTQ_CLEANED/{sample}_R2_cleaned.fastq"
params:
path_human= result_repository + "FASTQ_CLEANED/"
shell:
"""
{input.BBMAP} in1={input.unzip_fastq_R1} in2={input.unzip_fastq_R2}  \
basename={rules.clean_fastq.params.path_human}{wildcards.sample}_%.fastq outu1={output.R1_cleaned} outu2={output.R2_cleaned} \
path=temp/
"""


Each jobs have a cost of about 20Go RAM and the probleme is that I can only have 32 Go available. I dont know if this is possible for snakemake to execute all jobs from a same rule in a queue, to avoid this memory problem.

If not, I probably should check an other tool to process these fastq. Any ideas? (except bmtagger, I had too much problem with it haha) What would you suggest?

Thx,

snakemake BBmap • 833 views
2
Entering edit mode

Are you running them locally or on a cluster. At least in the latter case you just specify the memory required in the cluster command and let the cluster manager handle it. If you're running locally I expect you have to use -j 1 to just run one job at a time, since I don't think snakemake can be made aware of local limitations (this would be a good feature request!).

2
Entering edit mode

Also explicitly add amount of RAM you want to use for bbtools by using -XmxNNg to your bbtools command lines.

0
Entering edit mode

I'm a bit late, but thanks to everyone posted here!

Managing resources for snakemake seems to be a good way to limit multiple jobs. This perfectly solved my problem.

1
Entering edit mode

check the green mark on the left to validate dariober's answer.

3
Entering edit mode
2.2 years ago

I think you can use the resources directive in combination with the --resources command line option. E.g., your rule clen_fastq could be:

rule clean_fastq:
resources:
mem_gb= 20,
input:
...


then:

snakemake -j 10 --resources mem_gb=32 ...


This will run at most 10 jobs at a time using at most 32 GB of memory (of "mem_gb" in fact) which means clean_fastq will run one a time.