Question

How to convert a for loop to a Job-array in LSF cluster

1

Entering edit mode

18 months ago

LDT ▴ 340

I have 100 files, and I want to parallelise my submission to save time instead of running jobs one by one. How can I change this script to a Job-array in LSF using bsub submission system?

#BSUB -J ExampleJob1         #Set the job name to "ExampleJob1"
#BSUB -L /bin/bash           #Uses the bash login shell to initialize the job's execution environment.
#BSUB -W 2:00                #Set the wall clock limit to 2hr
#BSUB -n 1                   #Request 1 core
#BSUB -R "span[ptile=1]"     #Request 1 core per node.
#BSUB -R "rusage[mem=5000]"  #Request 5000MB per process (CPU) for the job
#BSUB -M 5000                #Set the per process enforceable memory limit to 5000MB.
#BSUB -o Example1Out.%J      #Send stdout and stderr to "Example1Out.[jobID]"

path=./home/

for each in *.bam 
do 
samtools coverage ${each} -o ${each}_coverage.txt
done

Thank you for your time; any help is appreciated. I am a starter at LSF and quite confused

lsf cluster-computing Job-array hpc • 1.3k views

ADD COMMENT • link updated 18 months ago by 5heikki 11k • written 18 months ago by LDT ▴ 340

1

Entering edit mode

How about just submitting jobs in a for loop, i.e. remove the loop from the script and replace ${each} with $1

ADD REPLY • link 18 months ago by 5heikki 11k

0

Entering edit mode

I ll try this one 5heikki, it seems very promising. I am sure thought how i can define the range to run only 10 jobs at a time

ADD REPLY • link 18 months ago by LDT ▴ 340

0

Entering edit mode

It shouldn't be your responsibility. Whoever setup the queue manager should be responsible for such stuff. As an example, you submit 10k jobs, but only 10 run in parallel (if there are available slots) because that's how the queue manager was setup..

ADD REPLY • link 18 months ago by 5heikki 11k

score 3 · Answer 1 · 2022-10-05

ok, second answer with a Makefile using option -j

SHELL=/bin/bash
BAMS=$(shell ls -1 /path/to/*.bam)
define run

$$(addsuffix .out,$(1)): $(1)
    samtools coverage  -o $$@ $$<
endef

all: $(addsuffix .out,$(basename ${BAMS}))

$(eval $(foreach B,${BAMS},$(call run,${B})))

and run with:

#BSUB -J ExampleJob1         #Set the job name to "ExampleJob1"
#BSUB -L /bin/bash           #Uses the bash login shell to initialize the job's execution environment.
#BSUB -W 2:00                #Set the wall clock limit to 2hr
#BSUB -n 10                   #Request 10 core <====================
#BSUB -R "span[ptile=1]"     #Request 1 core per node.
#BSUB -R "rusage[mem=5000]"  #Request 5000MB per process (CPU) for the job
#BSUB -M 5000                #Set the per process enforceable memory limit to 5000MB.
#BSUB -o Example1Out.%J      #Send stdout and stderr to "Example1Out.[jobID]"


cd /path/to/your/makefile/dir && make -j 10

score 2 · Answer 2 · 2022-10-05

use a workflow manager like nextflow. e.g (not tested)

params.bams="NO_FILE"

workflow  {
        ch1= ST_COVERAGE(params,Channel.fromPath(params.bams).splitText().map{it.trim()})
        MAKE_LIST(params,ch1.output.collect())
        }
process ST_COVERAGE {
tag "${bam}"
input:
     val(meta)
     val(bam)
output:
    path("coverage.txt"),emit:output
script:
"""
samtools coverage -o coverage.txt "${bam}"
"""
}

process MAKE_LIST {
input:
   val(meta)
   val(L)
output:
  path("all.list"),emit:output
script:
"""
cat << EOF > all.list
${L.join("\n")}
EOF
"""
}

and run, something like

nextflow run -C lsf.config -resume --bams bam_paths.txt biostars9540675.nf