What is wrong here with bbmap and java
1
0
Entering edit mode
6.8 years ago
seta ★ 1.7k

Hi everybody,

I'm trying to run bbduk from bbmap package on our server, but I encounter with the following error, which probably related to memory. However, I already successfully run this program, without any problem. Could you please let me know where does the error come from and how to solve it?

Exception in thread "Thread-4" java.lang.OutOfMemoryError: Java heap space
at kmer.HashArray1D.resize(HashArray1D.java:145)
at kmer.HashArray.setIfNotPresent(HashArray.java:187)
at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1772) at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1676)
at jgi.BBDukF$LoadThread.run(BBDukF.java:1605) Exception in thread "Thread-7" java.lang.OutOfMemoryError: Java heap space at kmer.HashArray1D.resize(HashArray1D.java:145) at kmer.HashArray.setIfNotPresent(HashArray.java:187) at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1772)
at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1676) at jgi.BBDukF$LoadThread.run(BBDukF.java:1605)


RNA-seq alignment java bbmap • 6.9k views
0
Entering edit mode
6.8 years ago
5heikki 10k

You ran it successfully in the same node with the same exact input data and other parameters like e.g. number of threads? Have you tried:

Java Parameters:
-Xmx       		If running from the shellscript, include it with the rest of the arguments and it will be passed to Java to set memory usage, overriding the shellscript's automatic memory detection.  -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs.  The max allowed is typically 85% of physical memory.
0
Entering edit mode

Yes, I ran successfully the same code with 5 threads, while now I'm trying with 7 threads. I should run our job on a server (256 GB RAM) via PBS file, which I determined 40 GB memory and 7 threads. No, I have not tried it. Sorry, I just include -Xmx 40 G at the beginning of the command, yes?

0
Entering edit mode

0
Entering edit mode
You specified only 1 GB of RAM? No wonder it fails..
0
Entering edit mode

Your mean is RMA is insufficient. I tried various RMA, 1g, 20g, 40g, 80g, but the problem still exists while I already run the program with 30 or 40g memory without any error. I don't know what happened here

0
Entering edit mode

Hi seta,

The amount of memory BBDuk uses is roughly 20 bytes per reference base. So, for things like quality-trimming and adapter-trimming it only needs under 100 MB of ram, but when filtering against a large reference it may need a lot. Also, the hdist flag will massively increase the amount of memory needed - for example, hdist=1 at K=31 will increase the memory needed by a factor of 93. qhist=1 will not increase the memory usage.

Could you perhaps post your full command line, and also the size of the reference files?

0
Entering edit mode

Hi Brian,

Thanks for your comment. The reference file is about 3g and with the flag of -Xmx4g, the problem was solved. However, it was strange for me as I already ran successfully the script using 30 or 40g with the same input data.

0
Entering edit mode

Hi Brian,

Please help me out why the program isn't reproducible for my work!. While I could run successfully the bbduk script with -Xmx4g, as I mentioned in my previous post here, it gives me an error for another input data. Actually I use the same code with the same reference file, it works for one input but, gives me the above-mentioned error for another input data. All input data were generated from Hiseq2000 (100bp PE). I would be highly appreciated for your any suggestion.

0
Entering edit mode

What version of Java are you using?

0
Entering edit mode

It's "1.7.0_79", which installed on ubuntu.

0
Entering edit mode

That should be fine then. Are you running the jobs under a job scheduler or directly on a server?

0
Entering edit mode

We have to run under a job schedule, however directly running is also possible. Actually, the error was appeared in both situations, running under a job scheduler and directly on the server.

0
Entering edit mode

I will assume that you are allocating resources that match your bbduk command (e.g. threads/cores, RAM etc) when using the scheduler. Only thing I can think of is somehow the jobs are running out of resources (shouldn't be happening if the scheduler is configured correctly) and it seems to happen randomly as you have stated above. I would suggest working with your sys admins to see if they can figure out something from system/scheduler logs.

0
Entering edit mode

Thanks for your comment. That's right about the schedule. Actually, I talked with our sys admin to directly run on the server, but the problem still randomly exist. So, it may be beyond just scheduler.

0
Entering edit mode

seta,

Could you post the full command line you used, and all of the console output (rather than just the error message)?

0
Entering edit mode

Hi Brian,

The command is:

./bbduk.sh -Xmx4g in=3_R1.fastq in2=3_R2.fastq outm=3_matched.fq outu1=3_unmatched_1.fq outu2=3_unmatched_2.fq ref=1.fa,2.fa,3.fa,4.fa,5.fa,6.fa,7.fa,8.fa,9.fa,10.fa,11.fa k=27 stats=stats1.txt threads=4


and the output:

java -Djava.library.path=/home/seta/software/bbmap/jni/ -ea -Xmx4g -Xms4g -cp /home/seta/software/bbmap/current/ jgi.BBDukF -Xmx4g in=3_R1.fastq in2=3_R2.fastq outm=3_matched.fq outu1=3_unmatched_1.fq outu2=3       _unmatched_2.fq ref=1.fa,2.fa,3.fa,4.fa,5.fa,6.fa,7.fa,8.fa,9.fa,10.fa,11.fa k=27 stats=stats1.txt threads=4
Executing jgi.BBDukF [-Xmx4g, in=3_R1.fastq, in2=3_R2.fastq, outm=3_matched.fq, outu1=3_unmatched_1.fq, outu2=3_unmatched_2.fq, ref=1.fa,2.fa,3.fa,4.fa,5.fa,6.fa,7.fa,8.fa,9.fa,10.fa,11.fa, k=27, stats=stats1.txt, threads=4]

BBDuk version 35.06
Initial:
Memory: free=4052m, used=64m

at kmer.HashArray1D.resize(HashArray1D.java:145)
at kmer.HashArray.setIfNotPresent(HashArray.java:187)
at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1772) at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1676)
at jgi.BBDukF$LoadThread.run(BBDukF.java:1605) Exception in thread "Thread-4" java.lang.OutOfMemoryError: Java heap space at kmer.HashArray1D.resize(HashArray1D.java:145) at kmer.HashArray.setIfNotPresent(HashArray.java:187) at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1772)
at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1676) at jgi.BBDukF$LoadThread.run(BBDukF.java:1605)


1
Entering edit mode

Hi seta,

You are definitely running out of memory, but it's not clear why that only happens sometimes. The easiest solution is to increase the -Xmx value. Note that the amount of memory used is based on the reference sequences, not the reads. You can increase the memory usage efficiency somewhat by adding the prealloc flag, and you can reduce the number of kmers stored by adding the flag rskip=4 (which will only store 1/4 of the reference kmers, reducing the amount of memory by 75%).

You can find out approximately how many reference kmers there are like this:

cat 1.fa 2.fa 3.fa 4.fa 5.fa 6.fa 7.fa 8.fa 9.fa 10.fa 11.fa | loglog.sh in=stdin.fa


The amount of memory needed (in bytes) will be around 20 times that number, but you can use the rskip flag to reduce it to a fraction of that.

Note that BBMap uses less memory than BBDuk, at 6 bytes per reference base, or 3 bytes in low-memory mode.

0
Entering edit mode

Many thanks for your comment. Based on what you kindly told me I calculated the required memory that's about 44g. I allocate this amount with the flag of prealloc and the problem solved, hope it goes well for other input data. BBDuk works so faster than BBMap, and so I prefer it. It's interesting for me how my previous run with the same reference file was successfully finished with only -Xmx4g.

0
Entering edit mode

Actually 1g is sufficient for bbduk. It is one of the programs in BBMap that does not require a lot of RAM.