Question: What is wrong here with bbmap and java
0
gravatar for seta
3.7 years ago by
seta1.2k
Sweden
seta1.2k wrote:

Hi everybody,

I'm trying to run bbduk from bbmap package on our server, but I encounter with the following error, which probably related to memory. However, I already successfully run this program, without any problem. Could you please let me know where does the error come from and how to solve it?

Exception in thread "Thread-4" java.lang.OutOfMemoryError: Java heap space

        at kmer.HashArray1D.resize(HashArray1D.java:145)

        at kmer.HashArray.setIfNotPresent(HashArray.java:187)

        at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1772)

        at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1676)

        at jgi.BBDukF$LoadThread.run(BBDukF.java:1605)

Exception in thread "Thread-7" java.lang.OutOfMemoryError: Java heap space

        at kmer.HashArray1D.resize(HashArray1D.java:145)

        at kmer.HashArray.setIfNotPresent(HashArray.java:187)

        at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1772)

        at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1676)

        at jgi.BBDukF$LoadThread.run(BBDukF.java:1605)

Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space

Thanks in advance.

rna-seq bbmap alignment java • 3.4k views
ADD COMMENTlink modified 3.7 years ago by 5heikki8.4k • written 3.7 years ago by seta1.2k

set the Xmx argument : http://opcodesolutions.com/tech/solve-java-lang-outofmemoryerror-java-heap-space/

ADD REPLYlink written 3.7 years ago by Pierre Lindenbaum122k
0
gravatar for 5heikki
3.7 years ago by
5heikki8.4k
Finland
5heikki8.4k wrote:

You ran it successfully in the same node with the same exact input data and other parameters like e.g. number of threads? Have you tried:

Java Parameters:
-Xmx       		If running from the shellscript, include it with the rest of the arguments and it will be passed to Java to set memory usage, overriding the shellscript's automatic memory detection.  -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs.  The max allowed is typically 85% of physical memory.
ADD COMMENTlink written 3.7 years ago by 5heikki8.4k

Yes, I ran successfully the same code with 5 threads, while now I'm trying with 7 threads. I should run our job on a server (256 GB RAM) via PBS file, which I determined 40 GB memory and 7 threads. No, I have not tried it. Sorry, I just include -Xmx 40 G at the beginning of the command, yes?

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by seta1.2k

I got your comment. I added the flag of -Xmx1g to the commands, but the problem still remains. Please help me what should I do.

ADD REPLYlink written 3.7 years ago by seta1.2k
You specified only 1 GB of RAM? No wonder it fails..
ADD REPLYlink written 3.7 years ago by 5heikki8.4k

Your mean is RMA is insufficient. I tried various RMA, 1g, 20g, 40g, 80g, but the problem still exists while I already run the program with 30 or 40g memory without any error. I don't know what happened here

ADD REPLYlink written 3.7 years ago by seta1.2k

Hi seta,

The amount of memory BBDuk uses is roughly 20 bytes per reference base.  So, for things like quality-trimming and adapter-trimming it only needs under 100 MB of ram, but when filtering against a large reference it may need a lot.  Also, the "hdist" flag will massively increase the amount of memory needed - for example, hdist=1 at K=31 will increase the memory needed by a factor of 93.  "qhist=1" will not increase the memory usage.

Could you perhaps post your full command line, and also the size of the reference files?

ADD REPLYlink written 3.7 years ago by Brian Bushnell16k

Hi Brian,

Thanks for your comment. The reference file is about 3g and with the flag of -Xmx4g, the problem was solved. However, it was strange for me as I already ran successfully the script using 30 or 40g with the same input data.

ADD REPLYlink written 3.7 years ago by seta1.2k

Hi Brian,

Please help me out why the program isn't reproducible for my work!. While I could run successfully the bbduk script with -Xmx4g, as I mentioned in my previous post here, it gives me an error for another input data. Actually I use the same code with the same reference file, it works for one input but, gives me the above-mentioned error for another input data. All input data were generated from Hiseq2000 (100bp PE). I would be highly appreciated for your any suggestion.

ADD REPLYlink written 3.7 years ago by seta1.2k

What version of Java are you using?

ADD REPLYlink written 3.7 years ago by genomax70k

It's "1.7.0_79", which installed on ubuntu.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by seta1.2k

That should be fine then. Are you running the jobs under a job scheduler or directly on a server?

ADD REPLYlink written 3.7 years ago by genomax70k

We have to run under a job schedule, however directly running is also possible. Actually, the error was appeared in both situations, running under a job scheduler and directly on the server.

ADD REPLYlink written 3.7 years ago by seta1.2k

I will assume that you are allocating resources that match your bbduk command (e.g. threads/cores, RAM etc) when using the scheduler. Only thing I can think of is somehow the jobs are running out of resources (shouldn't be happening if the scheduler is configured correctly) and it seems to happen randomly as you have stated above. I would suggest working with your sys admins to see if they can figure out something from system/scheduler logs.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by genomax70k

Thanks for your comment. That's right about the schedule. Actually, I talked with our sys admin to directly run on the server, but the problem still randomly exist. So, it may be beyond just scheduler. 

ADD REPLYlink written 3.7 years ago by seta1.2k

seta,

Could you post the full command line you used, and all of the console output (rather than just the error message)?

ADD REPLYlink written 3.7 years ago by Brian Bushnell16k

Hi Brian,

The command is:

 ./bbduk.sh -Xmx4g in=3_R1.fastq in2=3_R2.fastq outm=3_matched.fq outu1=3_unmatched_1.fq outu2=3_unmatched_2.fq ref=1.fa,2.fa,3.fa,4.fa,5.fa,6.fa,7.fa,8.fa,9.fa,10.fa,11.fa k=27 stats=stats1.txt threads=4

and the output:

java -Djava.library.path=/home/seta/software/bbmap/jni/ -ea -Xmx4g -Xms4g -cp /home/seta/software/bbmap/current/ jgi.BBDukF -Xmx4g in=3_R1.fastq in2=3_R2.fastq outm=3_matched.fq outu1=3_unmatched_1.fq outu2=3       _unmatched_2.fq ref=1.fa,2.fa,3.fa,4.fa,5.fa,6.fa,7.fa,8.fa,9.fa,10.fa,11.fa k=27 stats=stats1.txt threads=4

Executing jgi.BBDukF [-Xmx4g, in=3_R1.fastq, in2=3_R2.fastq, outm=3_matched.fq, outu1=3_unmatched_1.fq, outu2=3_unmatched_2.fq, ref=1.fa,2.fa,3.fa,4.fa,5.fa,6.fa,7.fa,8.fa,9.fa,10.fa,11.fa, k=27, stats=stats1.txt, threads=4]

 

BBDuk version 35.06

Set threads to 4

Initial:

Memory: free=4052m, used=64m

Exception in thread "Thread-3" java.lang.OutOfMemoryError: Java heap space

        at kmer.HashArray1D.resize(HashArray1D.java:145)

        at kmer.HashArray.setIfNotPresent(HashArray.java:187)

        at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1772)

        at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1676)

        at jgi.BBDukF$LoadThread.run(BBDukF.java:1605)

Exception in thread "Thread-4" java.lang.OutOfMemoryError: Java heap space

        at kmer.HashArray1D.resize(HashArray1D.java:145)

        at kmer.HashArray.setIfNotPresent(HashArray.java:187)

        at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1772)

        at jgi.BBDukF$LoadThread.addToMap(BBDukF.java:1676)

        at jgi.BBDukF$LoadThread.run(BBDukF.java:1605)

 

Thanks for all your help and support in advance.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by seta1.2k
1

Hi seta,

You are definitely running out of memory, but it's not clear why that only happens sometimes.  The easiest solution is to increase the -Xmx value.  Note that the amount of memory used is based on the reference sequences, not the reads.  You can increase the memory usage efficiency somewhat by adding the "prealloc" flag, and you can reduce the number of kmers stored by adding the flag rskip=4 (which will only store 1/4 of the reference kmers, reducing the amount of memory by 75%).

You can find out approximately how many reference kmers there are like this:

cat 1.fa 2.fa 3.fa 4.fa 5.fa 6.fa 7.fa 8.fa 9.fa 10.fa 11.fa | loglog.sh in=stdin.fa

The amount of memory needed (in bytes) will be around 20 times that number, but you can use the "rskip" flag to reduce it to a fraction of that.

Note that BBMap uses less memory than BBDuk, at 6 bytes per reference base, or 3 bytes in low-memory mode.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Brian Bushnell16k

Many thanks for your comment. Based on what you kindly told me I calculated the required memory that's about 44g. I allocate this amount with the flag of "prealloc" and the problem solved, hope it goes well for other input data. BBDuk works so faster than BBMap, and so I prefer it. It's interesting for me how my previous run with the same reference file was successfully finished  with only -Xmx4g.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by seta1.2k

Actually 1g is sufficient for bbduk. It is one of the programs in BBMap that does not require a lot of RAM.

ADD REPLYlink written 3.7 years ago by genomax70k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1209 users visited in the last hour