Allocating memory for samtools sort in nextflow?
2
0
Entering edit mode
17 months ago
Eliveri ▴ 350

I have a nextflow pipeline which keeps failing due to memory issues at the FILTER step.

In some cases the input bam is extremely large (30G).

Question 1: Would it help to split the FILTER process into two FILTER1 and FILTER2 process each running just one samtools sort? I tried this with a test...and it didn't seem to make a difference for some reason.

Question 2: Is there something I can do limit samtools sort (paramters -m or -@) so that it does not go over the memory limit but also doesn't run too slowly?

 process FILTER {

        ...

        time '6h'
        cpus 8
        penv 'smp' 
        memory '32 GB'

        script:
        """
        #!/usr/bin/env bash
        samtools sort -n -m 5G -@ 12 "file1.bam" -o "$file1_sorted.bam"
        samtools sort -n -m 5G -@ 12 "file2.bam" -o "$file2_sorted.bam"
    """
samtools nextflow • 1.1k views
ADD COMMENT
4
Entering edit mode
17 months ago
ATpoint 82k

The -m value is per thread so samtools uses 5*12GB of RAM and that is more that the memory declaration of the process. Is is (imo) also unnecessarily many threads and RAM. Sort is reasonably fast and efficient with, say 4-8 cores and 1-2GB each.

ADD COMMENT
0
Entering edit mode

Thank you for breaking it down for me. I really learned a lot. I was indeed making the mistake of using more memory than declared.

ADD REPLY
4
Entering edit mode
17 months ago

Would it help to split the FILTER

it helps as you can parallelize the two sort . Or if the second one fails, the first one is still here.

there something I can do limit samtools sort (paramters -m or -@)

you should use what was declared in the process (cpu/memory).

    cpus 8
    memory '32 GB'
    afterScript "rm -rf TMP"
    script:
    """
    mkdir TMP
    samtools sort -n -m "${task.memory.giga}G" -@ "${task.cpus}" -T TMP/tmp "${input}" -o "TMP/out.bam"
    mv TMP/out.bam sorted.bam
    """
ADD COMMENT
0
Entering edit mode

Thank you. Making sure I do not use more memory than declared resolved the issue.

ADD REPLY

Login before adding your answer.

Traffic: 1635 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6