Qualimap Settings For A Very Large Bam, Chunk Size And Number Of Windows
1
2
Entering edit mode
11.3 years ago
William ★ 5.3k

Does it help to adjust the chunk size and or number of windows for qualimap when processing a very large bam (150GB / 3 billion reads)?

What I normally do is just increase the java maximum heap size but that is now almost set to the machine maximum (on a very big machine).

Qualimap processes the default 400 windows very fast until maximum memory is reached, then processing speed goes down and it doesn't do much multi threading any more.

qc bam • 6.4k views
ADD COMMENT
1
Entering edit mode

Out of curiosity, how long does it take for you to run this BAM file?

ADD REPLY
0
Entering edit mode

Processing didn't finish on a 48 core 500GB memory machine within 3 days. I tried to run it for a week on a 32 GB 8 core machine and then I split the bam per chomosome and ran qualimap for each chromosome. Per chromosome worked fine.

ADD REPLY
1
Entering edit mode
11.1 years ago
Fennan ▴ 30

Hi William,

Short answer:

Yes it helps.

Long answer:

It is not so easy to tune these parameters for an optimal performance.

The effect that your are seeing has to do with Java's garbage collector.

In order to improve speed you can play with the following parameters:

--java-mem-size -> The bigger the faster (the garbage collector will work less often).

-nw (number of windows) -> The bigger the slower with better resolution on the results and less Java memory needed. The smaller the faster with less resolution on the results and more Java memory needed. Basically, every time the end of a window is reached, we launch a thread that computes the corresponding statistics for the window (please, read this for more info: http://qualimap.bioinfo.cipf.es/doc_html/analysis.html#advanced-parameters). Unfortunately there is no general rule to set an optimal value since it will depend on the data being analysed (peaks of coverage imply more memory and time needed for one window).

-nt (number of threads) -> The bigger the faster

-nr (number of reads in the chunk) -> It controls how many reads are stored in RAM before they are computed. We do this to be able to keep the threads busy as much time as possible since the I/O from the disk is very time consuming. This should therefore be optimized with respect to the number of threads the Java memory size and the number of windows in order to minimize the time where threads are idle.

Please note that the garbage collector together with the I/O from disk are the two bottlenecks in Qualimap performance. It is a priority for us to improve this in future versions. Any idea is most welcome :-)

ADD COMMENT
0
Entering edit mode

Rewrite in Scala or maybe build on top of the GATK map reduce framework ? The GATK map reduce framework is totally free with an MIT license and should be able to handle really big BAM files.

ADD REPLY

Login before adding your answer.

Traffic: 2114 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6