Question

Problem with minia and bad_alloc, low memory limits?

0

Entering edit mode

9.8 years ago

jorgegiacomelli ▴ 10

Hi, im processing a large dataset, aprox. 234G fasta.gz of paired end illumina reads, and after a week it terminates with an bad_alloc error. Seems a memory problem, mi server has 16G and the Blooms have less than 6 G, so what's the problem?

I'm using 3 Gb as genome size, perhaps I should increase that to cover that memory? The estimate in the begining is approx. 2 G of memory, the error is because I'm over that limit?

Any suggestions?

Thank u in advance

-------------------Debloom time Wallclock  142447 s
binary pass
Insert solid Kmers in Bloom 5235620000
Inserted 5235629138 solid kmers in the bloom structure.
Insert false positive T4 256778974Size of the Bloom table (B1)  : 3766.27 MB
Size of the Bloom table (B2)  : 1225.75 MB
Size of the Bloom table (B3)  : 210.44 MB
Size of the Bloom table (B4)  : 68.49 MB
Size of the FP table (T4)     : 29.43 MB
      Total 5300.37 MB for 5235629138 solid kmers  ==>  8.49 bits / solid kmer

______________________________________________________
___________ Assemble from bloom filter _______________
______________________________________________________

Extrapolating the number of branching kmers from the first 3M kmers: 388193061
Indexing branching kmers 536870500 / ~388191870 terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
estimated values: nbits Bloom 18103109632, nb FP 43535668, max memory 2158 MB

minia bad_alloc memory • 3.0k views

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by jorgegiacomelli ▴ 10

1

Entering edit mode

Thanks for your answers!

I used 31, min abundance of 3. It's true, I forgot Kmergenie... I'll try with your suggestions

Thank u again. Great software!

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by jorgegiacomelli ▴ 10

Ram · Answer 1 · 2014-07-03

The estimation of memory that you saw in first step of Minia (k-mer counting) is only valid during this step. During the assembly phase, Minia will use as much memory as needed to store the de Bruijn graph, regardless of the first step.

In addition to allocating memory for a Bloom filter, Minia also uses memory for storing the set of branching nodes in the graph. Normally, on a mammalian dataset with reasonable parameters given to Minia, this set is not so large. Apparently, in your case, the set of branching k-mers exceeds the available memory. Indeed, from the log you posted, it stored 388 million k-mers, which is quite unusual (probably contains many branches to sequencing errors).

What parameters did you give to Minia? (kmer size, and more importantly, min_abundance?)

I see from the log excerpt that the total number of solid kmers (5235629138) is much larger than your expected genome size. This generally happens when you set a too low value for min_abudance. My best guess is that you should increase min_abundance so that the number of solid k-mers is closer to your expected genome size.

In the future, you might want to run Kmergenie prior to running Minia, as it helps choosing k and min_abundance.

Anyhow, Minia should work on your dataset. Please let us know if you have further issues.

Ram · Answer 2 · 2014-07-03

0

Entering edit mode

9.8 years ago

lh3 33k

Probably yes. You'd better use a machine with more RAM. 16GB is tiny in today's standard. The actual in-memory data representation is at times larger.

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by lh3 33k