Question

LoRDEC correction insufficient memory error

0

Entering edit mode

7.9 years ago

Medhat 9.7k

Hi,

when I run LoRDEC to correct pacbio read I have this error while creating the graph:

creating the graph from file(s): ./files_name.txt [DSK: Collecting stats on files_name ] 100 % elapsed: 8 min 7 sec
remaining: 0 min 0 sec cpu: 218.3 % mem: [14814, 14814, 14814] MB [DSK: Pass 1/1, Step 2: counting kmers ] 56 % elapsed: 92 min 6 sec remaining: 72 min 18 sec cpu: 403.4 % mem: [13617, 13633, 14815] MB EXCEPTION: Pool allocation failed for 90748512 bytes (bank ids alloc). Current usage is 2021217248 and capacity is 2097152000

command used

./LoRDEC-0.6/lordec-correct -t 5 -b 200 -e 0.4 -2 ./files_name.txt -k 23 -s 3 -i ./filtered_subreads.fasta -T 10 -S stat -o ./245_pacbio_corrected.fa

And it runs on mainframe computer with sufficient memory help is appreciated

gene software error Assembly genome • 3.3k views

ADD COMMENT • link 7.9 years ago by Medhat 9.7k

0

Entering edit mode

You should have sufficient memory, but what was the memory usage (overall) on the machine when you ran it?

ADD REPLY • link 7.9 years ago by pld 5.1k

0

Entering edit mode

memory in the machine is 15 TB

ADD REPLY • link 7.9 years ago by Medhat 9.7k

score 1 · Accepted Answer · 2016-05-26

1

Entering edit mode

7.9 years ago

edrezen ▴ 730

The issue occurs during the kmer counting algorithm of the GATB library; I think this bug has been corrected in the latest version of the library 1.2.0.

However, there are some API changes in the GATB library 1.2.0 so it is not usable as it with lordec.

The best shot you have right now is to patch the lordec-correct.cpp file at line 1511 (that begins by "graph = Graph::create...") by adding " -max-memory 8000" after "-nb-cores %d"; here 8000 means 8 Gbytes, so you could try 16000 or more (not too much) if your server has indeed a lot of memory.

Then you can compile again lordec-correct and see if you still have the issue.

ADD COMMENT • link 7.9 years ago by edrezen ▴ 730

0

Entering edit mode

Thanks a lot, This should be the right answer for my question, Details;

First I was using GATB 1.1.0 as suggested by the software installation document, when I tried to use GATB 1.2 It did not compile , So I followed The other suggestion -max-memory 20000. Till now it works fine.

ADD REPLY • link 7.9 years ago by Medhat 9.7k

0

Entering edit mode

I appreciate that you posted the question& answer, since I was at a loss.

However, this did not solve my problem& I still get the segmentation error though I have 24GB memory (-max-memory 22000). Is this how line 1511 should read?

graph = Graph::create (b, (const char *)"-kmer-size %d -abundance-min %d -bloom cache -debloom original -debloom-impl basic -nb-cores %d -max-memory 22000", kmer_len, solid_kmer_thr, threads);

I tried this with both gatb 1.0.6 and gatb 1.1.0, without success.

This is to correct an E. coli genome of about 5.3Mbp. The pacbio reads (as 9 contigs) are each under 2Mbp.

ADD REPLY • link 7.7 years ago by jrchase • 0

0

Entering edit mode

for me I advice that you work with GATB 1.1.0 as suggested by the software and also try to use only 10000 so it would be like

graph = Graph::create (b, (const char *)"-kmer-size %d -abundance-min %d -bloom cache -debloom original -debloom-impl basic -nb-cores %d -max-memory 10000", kmer_len, solid_kmer_thr, threads);

recompile it again and run it.

ADD REPLY • link 7.7 years ago by Medhat 9.7k

0

Entering edit mode

thank you

sadly, neither 10000, 16000, nor 20000 for -max-memory was able to avoid the segmentation fault error.

any chance that LoRDEC will be updated to use the newer GATB library without this bug?

ADD REPLY • link 7.7 years ago by jrchase • 0

0

Entering edit mode

I asked them about the bug, but there was no answer , maybe they will do it soon I have no I dea, so there is couple of solutions, but first how many threads you use?

ADD REPLY • link 7.7 years ago by Medhat 9.7k

0

Entering edit mode

I just use the default of all threads...

ADD REPLY • link 7.7 years ago by jrchase • 0

0

Entering edit mode

change it to 1 or 2 , so we can use less rams

ADD REPLY • link 7.7 years ago by Medhat 9.7k

0

Entering edit mode

It appears that the length of the PacBio contigs is the issue. I broke the assembled unitigs into 5000 lines (at most) and LoRDEC was easily able to use short read (Roche454) data to correct an E. coli genome PacBio.