Question: bbduk.sh exceeding requested threads in an alignment pipeline
0
gravatar for ido.idobar
5 months ago by
ido.idobar10
ido.idobar10 wrote:

Hi all, I'm running a pipeline that includes bbduk.sh for reads trimming, then aligning the trimmed reads to a reference genome using bowtie2, piping the alignment through samblaster to remove duplicates and finally sorting, adding read groups and saving as BAM with Picard. Each job processes a paired read on an HPC node. The problem is that despite specifying the number of requested threads, bbduk is trying to use more threads than requested, causing the job to be terminated by the queue manager. This is my command:

bbduk.sh -Xmx1g ref=/home/ibar/.pyenv/versions/miniconda-latest/envs/aDNA/opt/bbmap-38.86-0/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 qtrim=rl trimq=10 tpe tbo int minlen=30 ziplevel=9 ow threads=12 in=./D10_#.fastq.gz out=trimmed_D10_#.fastq.gz stats=D10.stats

And this is the output:

java -ea -Xmx1g -Xms1g -cp /home/ibar/.pyenv/versions/miniconda-latest/envs/aDNA/opt/bbmap-38.86-0/current/ jgi.BBDuk -Xmx1g ref=/home/ibar/.pyenv/versions/miniconda-latest/envs/aDNA/opt/bbmap-38.86-0/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 qtrim=rl trimq=10 tpe tbo int minlen=30 ziplevel=9 ow threads=12 in=./D10_#.fastq.gz out=trimmed_D10_#.fastq.gz stats=D10.stats ow
Executing jgi.BBDuk [-Xmx1g, ref=/home/ibar/.pyenv/versions/miniconda-latest/envs/aDNA/opt/bbmap-38.86-0/resources/adapters.fa, ktrim=r, k=23, mink=11, hdist=1, qtrim=rl, trimq=10, tpe, tbo, int, minlen=30, ziplevel=9, ow, threads=12, in=./D10_#.fastq.gz, out=trimmed_D10_#.fastq.gz, stats=D10.stats, ow]
Version 38.86

Set INTERLEAVED to true
Set threads to 12
maskMiddle was disabled because useShortKmers=true
Reset INTERLEAVED to false because paired input files were specified.
0.235 seconds.
Initial:
Memory: max=1029m, total=1029m, free=995m, used=34m

Added 217135 kmers; time:       0.457 seconds.
Memory: max=1029m, total=1029m, free=957m, used=72m

Input is being processed as paired
Started output streams: 0.229 seconds.
=>> PBS: job killed: ncpus 14.83 exceeded limit 12 (sum)
Exception in thread "Thread-16" Exception in thread "Thread-18"

Thanks, Ido

bbmap hpc • 298 views
ADD COMMENTlink written 5 months ago by ido.idobar10
2

This should not be happening. Looks like you are using PBS. Are you asking for a corresponding number of threads on queue manager side? I would ask for 4 more cores (or reduce the number for bbduk.sh) since it looks like there is some overhead to the command.

ADD REPLYlink written 5 months ago by GenoMax96k

Looks like it, but as you've said, this shouldn't happen and I never had issues with it before (and I've used this approach many times before). This is a new conda environment, so it might be related to a newer bbmap version (this one is Version 38.86). I hope Brian Bushnell might have more insights about this. Thanks for your quick reply, I'll try adding a few extra cores to my request to the queue manager.

ADD REPLYlink written 5 months ago by ido.idobar10

This is the explanation from the online user guide:

Threads Most BBTools are multithreaded, and will automatically detect and use all available threads. This is usually desirable when you have exclusive use of a computer, but may not be on a shared node. The number of threads can be capped at X with the flag “t=X” (threads=X). The total CPU usage may still go higher, though, due to several factors: 1) Input and output are handled in separate threads; “t=X” only regulates the number of worker threads. 2) Java uses additional threads for garbage collection and other virtual machine tasks. 3) When subprocesses (such as pigz) are spawned, they also individually obey the thread limit, but if you set “t=4” and the process spawns 3 pigz instances, you could still potentially use over 16 threads – 4 worker threads, 4 threads for each pigz process, plus other threads for the JVM and I/O. If you have exclusive use of a computer, you don’t need to worry about spawning too many threads; this is only an issue with regards to fairness on shared nodes.

Not very helpful...

ADD REPLYlink written 5 months ago by ido.idobar10
1

This may also be related to how your job manager is set up. I use threads with BBMap under SLURM and have not had this specific issue ever. If you are using pigz then turn it off by pigz=f in your command line and see if that fixes this. It will add some time to individual jobs but that may be a safe compromise.

ADD REPLYlink written 5 months ago by GenoMax96k
1

I am guessing that you are exceeding the limit because you are reading from and/or creating a gzipped output. If I remember correctly - someone here will know this better - all BBTools programs use pigz when (un)compressing, and that program tends to be CPU-greedy. I suggest you try removing ziplevel=9 from your command and saving the files without .gz.

Also, it is generally a good idea to ask for fewer threads within the program than what you reserve in your job manager, as programs sometimes spill out of their allotted number by a percent or so. In other words, follow @genomax's advice.

ADD REPLYlink written 5 months ago by Mensur Dlakic9.0k

Thanks for your reply. I've tried it again and had the same issue with pigz=f. I prefer to save the files compressed since our workspace capacity is limited and monitored. Tried it on another HPC cluster (using PBS as well) and didn't encounter that problem using exactly the same command. It may be the older BBMap version that is installed there (v38.79) or some differences in the way the scheduler works. I'll try to downgrade BBMap and see if it solves the problem.

ADD REPLYlink written 5 months ago by ido.idobar10

While anything is possible I think this problem is not related to BBMap version. It is possible that cluster you are having issues with is setup with stricter limits than other one where you did not. Did you try reducing the number of threads in bbduk command line?

ADD REPLYlink modified 5 months ago • written 5 months ago by GenoMax96k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1907 users visited in the last hour
_