How to estimate peak memory usage of SOAPdenovo
1
0
Entering edit mode
8.8 years ago
billzt ▴ 20

Hello to everyone. As I need to rent servers to run SOAPdenovo, I have to know how much physical memory I need, given on my genome size (~2G), my data ( Illumina PE ~200G) and my kmer size (63).

I was glad to share other parameters if needed. Thank you!

Assembly SOAPdenovo • 3.2k views
ADD COMMENT
1
Entering edit mode
8.8 years ago

You can count the number of unique kmers using the BBMap package:

khist.sh in=reads.fq khist=khist.txt peaks=peaks.txt -Xmx30g bits=8

That will count 31-mers, but the number of 63-mers should be similar (within a factor of 2 for all unique kmers, and almost the same for unique kmers above 5x or so). The amount of memory Soap needs should be proportional to the number of unique kmers. Therefore, if you know how much memory was needed for a dataset with 100M unique kmers, you should be able to predict quite accurately that a dataset with 20G unique kmers will need 200x as much.

khist (which uses BBNorm) has the neat property of never running out of memory due to the size of the dataset. If you set -Xmx30g, it will use approximately 30GB of RAM (probably about 32GB total) for a bacteria or for a plant; rather, the accuracy of the histogram declines slightly with a lower amount of memory, which is not really important for this use case. The unique kmer estimate will still be very accurate.

ADD COMMENT

Login before adding your answer.

Traffic: 2934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6