Question: Qualimap Settings For A Very Large Bam, Chunk Size And Number Of Windows
2
gravatar for William
5.6 years ago by
William4.3k
Europe
William4.3k wrote:

Does it help to adjust the chunk size and or number of windows for qualimap when processing a very large bam (150GB / 3 billion reads)?

What I normally do is just increase the java maximum heap size but that is now almost set to the machine maximum (on a very big machine).

Qualimap processes the default 400 windows very fast until maximum memory is reached, then processing speed goes down and it doesn't do much multi threading any more.

bam qc • 2.5k views
ADD COMMENTlink modified 5.5 years ago by Fennan30 • written 5.6 years ago by William4.3k
1

Out of curiosity, how long does it take for you to run this BAM file?

ADD REPLYlink written 5.5 years ago by Fennan30

Processing didn't finish on a 48 core 500GB memory machine within 3 days. I tried to run it for a week on a 32 GB 8 core machine and then I split the bam per chomosome and ran qualimap for each chromosome. Per chromosome worked fine.

ADD REPLYlink written 5.5 years ago by William4.3k
1
gravatar for Fennan
5.5 years ago by
Fennan30
Fennan30 wrote:

Hi William,

Short answer:

Yes it helps.

Long answer:

It is not so easy to tune these parameters for an optimal performance.

The effect that your are seeing has to do with Java's garbage collector.

In order to improve speed you can play with the following parameters:

--java-mem-size -> The bigger the faster (the garbage collector will work less often).

-nw (number of windows) -> The bigger the slower with better resolution on the results and less Java memory needed. The smaller the faster with less resolution on the results and more Java memory needed. Basically, every time the end of a window is reached, we launch a thread that computes the corresponding statistics for the window (please, read this for more info: http://qualimap.bioinfo.cipf.es/doc_html/analysis.html#advanced-parameters). Unfortunately there is no general rule to set an optimal value since it will depend on the data being analysed (peaks of coverage imply more memory and time needed for one window).

-nt (number of threads) -> The bigger the faster

-nr (number of reads in the chunk) -> It controls how many reads are stored in RAM before they are computed. We do this to be able to keep the threads busy as much time as possible since the I/O from the disk is very time consuming. This should therefore be optimized with respect to the number of threads the Java memory size and the number of windows in order to minimize the time where threads are idle.

Please note that the garbage collector together with the I/O from disk are the two bottlenecks in Qualimap performance. It is a priority for us to improve this in future versions. Any idea is most welcome :-)

ADD COMMENTlink written 5.5 years ago by Fennan30

Rewrite in Scala or maybe build on top of the GATK map reduce framework ? The GATK map reduce framework is totally free with an MIT license and should be able to handle really big BAM files.

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by William4.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1066 users visited in the last hour