How does bbduk calculate it's memory requirements?
0
0
Entering edit mode
4.9 years ago
Lina F ▴ 200

I ran bbduk using:

/opt/bbmap/bbduk.sh in1=R1.fq.gz in2=R2.fq.gz out1=cleaned_R1.fq.gz out2=cleaned_R2.fq.gz ref=reference.fa k=31 hdist=1 stats=stats.txt overwrite=true

Which bbduk translated to:

java -ea -Xmx299m -Xms299m -cp /opt/bbmap/current/ jgi.BBDuk in1=R1.fq.gz in2=R2.fq.gz out1=cleaned_R1.fq.gz out2=cleaned_R2.fq.gz ref=reference.fa k=31 hdist=1 stats=stats.txt overwrite=true

Executing jgi.BBDuk [in1=R1.fq.gz, in2=R2.fq.gz, out1=cleaned_R1.fq.gz, out2=cleaned_R2.fq.gz, ref=reference.fa, k=31, hdist=1, stats=stats.txt, overwrite=true]
Version 38.44

0.011 seconds.
Initial:
Memory: max=301m, total=301m, free=282m, used=19m

java.lang.OutOfMemoryError
    at kmer.HashArray1D.resize(HashArray1D.java:216)
    at kmer.HashArray.setIfNotPresent(HashArray.java:210)
    at jgi.BBDuk$LoadThread.mutate(BBDuk.java:2251)
    at jgi.BBDuk$LoadThread.mutate(BBDuk.java:2272)
    at jgi.BBDuk$LoadThread.addToMap(BBDuk.java:2226)
    at jgi.BBDuk$LoadThread.addToMap(BBDuk.java:2124)
    at jgi.BBDuk$LoadThread.run(BBDuk.java:2033)

This program ran out of memory.
Try increasing the -Xmx flag and using tool-specific memory-related parameters.

As you can see, the Xmx and the Xms values are pretty low.

I am running this in a docker container and this is the memory I have available:

$> cat /proc/meminfo
MemTotal:        2046748 kB
MemFree:         1261068 kB
MemAvailable:    1536428 kB
Buffers:          186136 kB
Cached:           191120 kB
SwapCached:          800 kB
...

In the BBDuk reference on the webpage, I read the following regarding calculating Xmx and Xms:

BBDuk's shellscript will try to autodetect the available memory and use about half of it. You can override this with with the -Xmx flag, e.g. "bbduk.sh -Xmx1g in=reads.fq". That command will force it to use 1 GB. Most operations such as adapter-trimming and quality-trimming need only a tiny amount of memory. Only processing large references, or using a high value of "hdist" or "edist", actually need a lot of memory. The only factor determining how much memory BBDuk needs is the number of reference kmers stored, which is linearly proportional to the size of the reference. So, if you are not going to be using a reference, or only a small reference, you can add the flag -Xmx1g. If you will using a large reference, modify that flag to be around 85% of the machine's physical memory – for example, -Xmx27g on a 32GB machine. The actual maximum you can use depends on the operating system's configuration.

Specifically, it says "...autodetect the available memory and use about half of it...". Based on /proc/meminfo I have much more memory available than ~2x300 Mb.

Any ideas?

If possible I would like to take advantage of the "autodetection" feature because then I don't have to hard-code memory values.

Thanks for any suggestions!

bbduk docker memory • 2.2k views
ADD COMMENT
1
Entering edit mode

Lina F : bbduk.sh memory needs are very low. A couple of gigabytes is normally sufficient if you are just scanning for presence of primer/adapters (-Xmx2g).

The autodetect feature is supposed to work on standalone servers (which may not be applicable to VM's). What is the size of your reference.fa? Depending on that we can adjust the memory specification.

ADD REPLY
0
Entering edit mode

It's actually pretty small:

$> ls -lah reference.fa
-rwxrwxrwx 1 1002 1002 115K Oct 14  2016 reference.fa

I thought I could avoid setting Xmx and Xms manually. Maybe that is not the case.

Thanks for any suggestions you might have!

ADD REPLY
1
Entering edit mode

Based on the info above you have assigned only 2GB RAM to this VM. Is it possible to use more? If not I would try setting -Xmx1g and see if that works. Your reference is small enough but the memory needs would be dependent on how many unique k-mers it generates.

ADD REPLY
0
Entering edit mode

I upped the memory to 8 Gb for the docker container and it worked! bbduk used -Xmx1267m -Xms1267m

It seems like I underestimated how much memory the kmers take up and maybe the autodetect feature really does work differently in a docker container.

Thanks for helping me debug this!!

ADD REPLY

Login before adding your answer.

Traffic: 2636 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6