Snakemake and BBsplit.sh for filtrate human reads : memory management
1
1
Entering edit mode
14 months ago

Hi there!

I have a list of paired fastq. I want to filtrate human read from each of these pairs using BBmap tool. here is a functionnal snakemake rule i wrote:

#Removing human reads from fastq
rule clean_fastq:
    message:
        "Removing human reads from fastq."
    input:
        unzip_fastq_R1 = rules.fastq_unzip.output.unzip_fastq_R1,
        unzip_fastq_R2 = rules.fastq_unzip.output.unzip_fastq_R2,
        BBMAP = rules.get_BBmap.output.BBMAP    ,
        index = rules.create_index.output.index    
    output:
        R1_cleaned = result_repository + "FASTQ_CLEANED/{sample}_R1_cleaned.fastq",
        R2_cleaned = result_repository + "FASTQ_CLEANED/{sample}_R2_cleaned.fastq"
    params:
        path_human= result_repository + "FASTQ_CLEANED/"   
    shell:
        """
        {input.BBMAP} in1={input.unzip_fastq_R1} in2={input.unzip_fastq_R2}  \
        basename={rules.clean_fastq.params.path_human}{wildcards.sample}_%.fastq outu1={output.R1_cleaned} outu2={output.R2_cleaned} \
        path=temp/ 
        """

Each jobs have a cost of about 20Go RAM and the probleme is that I can only have 32 Go available. I dont know if this is possible for snakemake to execute all jobs from a same rule in a queue, to avoid this memory problem.

If not, I probably should check an other tool to process these fastq. Any ideas? (except bmtagger, I had too much problem with it haha) What would you suggest?

Thx,

Hadrien

snakemake BBmap • 388 views
ADD COMMENT
2
Entering edit mode

Are you running them locally or on a cluster. At least in the latter case you just specify the memory required in the cluster command and let the cluster manager handle it. If you're running locally I expect you have to use -j 1 to just run one job at a time, since I don't think snakemake can be made aware of local limitations (this would be a good feature request!).

ADD REPLY
2
Entering edit mode

Also explicitly add amount of RAM you want to use for bbtools by using -XmxNNg to your bbtools command lines.

ADD REPLY
0
Entering edit mode

I'm a bit late, but thanks to everyone posted here!

Managing resources for snakemake seems to be a good way to limit multiple jobs. This perfectly solved my problem.

ADD REPLY
1
Entering edit mode

check the green mark on the left to validate dariober's answer.

ADD REPLY
3
Entering edit mode
14 months ago

I think you can use the resources directive in combination with the --resources command line option. E.g., your rule clen_fastq could be:

rule clean_fastq:
      resources:
          mem_gb= 20,
      input:
      ...

then:

snakemake -j 10 --resources mem_gb=32 ...

This will run at most 10 jobs at a time using at most 32 GB of memory (of "mem_gb" in fact) which means clean_fastq will run one a time.

ADD COMMENT

Login before adding your answer.

Traffic: 1235 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6