Question: Problem with minia and bad_alloc, low memory limits?
0
gravatar for jorgegiacomelli
3.4 years ago by
Argentina
jorgegiacomelli10 wrote:

Hi, im processing a large dataset, aprox. 234G fasta.gz of paired end illumina reads, and after a week it terminates with an bad_alloc error. Seems a memory problem, mi server has 16G and the Blooms have less than 6 G, so what's the problem? 

I'm using 3 Gb as genome size, perhaps I should increase that to cover that memory? The estimate in the beggining is aprox. 2 G of memory, the error is because I'm over that limit?

Any suggestions? 

Thank u in advance

 

-------------------Debloom time Wallclock  142447 s
binary pass
Insert solid Kmers in Bloom 5235620000
Inserted 5235629138 solid kmers in the bloom structure.
Insert false positive T4 256778974Size of the Bloom table (B1)  : 3766.27 MB
Size of the Bloom table (B2)  : 1225.75 MB
Size of the Bloom table (B3)  : 210.44 MB
Size of the Bloom table (B4)  : 68.49 MB
Size of the FP table (T4)     : 29.43 MB
      Total 5300.37 MB for 5235629138 solid kmers  ==>  8.49 bits / solid kmer

______________________________________________________
___________ Assemble from bloom filter _______________
______________________________________________________

Extrapolating the number of branching kmers from the first 3M kmers: 388193061
Indexing branching kmers 536870500 / ~388191870terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

estimated values: nbits Bloom 18103109632, nb FP 43535668, max memory 2158 MB

bad_alloc minia memory • 1.4k views
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by jorgegiacomelli10
2
gravatar for Rayan Chikhi
3.4 years ago by
Rayan Chikhi1.2k
France, Lille, CNRS
Rayan Chikhi1.2k wrote:

The estimation of memory that you saw in first step of Minia (k-mer counting) is only valid during this step. During the assembly phase, Minia will use as much memory as needed to store the de Bruijn graph, regardless of the first step.

In addition to allocating memory for a Bloom filter, Minia also uses memory for storing the set of branching nodes in the graph. Normally, on a mammalian dataset with reasonable parameters given to Minia, this set is not so large. Apparently, in your case, the set of branching k-mers exceeds the available memory. Indeed, from the log you posted, it stored 388 million k-mers, which is quite unusual (probably contains many branches to sequencing errors).

What parameters did you give to Minia? (kmer size, and more importantly, min_abundance?)

I see from the log excerpt that the total number of solid kmers (5235629138) is much larger than your expected genome size. This generally happens when you set a too low value for min_abudance. My best guess is that you should increase min_abundance so that the number of solid k-mers is closer to your expected genome size.

In the future, you might want to run Kmergenie prior to running Minia, as it helps choosing k and min_abundance.

Anyhow, Minia should work on your dataset. Please let us know if you have further issues.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Rayan Chikhi1.2k
1
gravatar for jorgegiacomelli
3.4 years ago by
Argentina
jorgegiacomelli10 wrote:

Thanks for your answers!

I used 31, min abundance of 3. It's true, I forgot Kmergenie... I'll try with your suggestions

Thank u again. Great software! 

 

 

ADD COMMENTlink written 3.4 years ago by jorgegiacomelli10
0
gravatar for lh3
3.4 years ago by
lh330k
United States
lh330k wrote:

Probably yes. You'd better use a machine with more RAM. 16GB is tiny in today's standard. The actual in-memory data representation is at times larger.

ADD COMMENTlink written 3.4 years ago by lh330k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 570 users visited in the last hour