Question: LoRDEC correction insufficient memory error
0
gravatar for Medhat
3.5 years ago by
Medhat8.5k
Texas
Medhat8.5k wrote:

Hi,

when I run LoRDEC to correct pacbio read I have this error while creating the graph:

creating the graph from file(s): ./files_name.txt [DSK: Collecting stats on files_name ] 100 % elapsed: 8 min 7 sec
remaining: 0 min 0 sec cpu: 218.3 % mem: [14814, 14814, 14814] MB [DSK: Pass 1/1, Step 2: counting kmers ] 56 % elapsed: 92 min 6 sec remaining: 72 min 18 sec cpu: 403.4 % mem: [13617, 13633, 14815] MB EXCEPTION: Pool allocation failed for 90748512 bytes (bank ids alloc). Current usage is 2021217248 and capacity is 2097152000

command used

./LoRDEC-0.6/lordec-correct -t 5 -b 200 -e 0.4 -2 ./files_name.txt -k 23 -s 3 -i ./filtered_subreads.fasta -T 10 -S stat -o ./245_pacbio_corrected.fa

And it runs on mainframe computer with sufficient memory help is appreciated

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Medhat8.5k

You should have sufficient memory, but what was the memory usage (overall) on the machine when you ran it?

ADD REPLYlink written 3.5 years ago by pld4.8k

memory in the machine is 15 TB

ADD REPLYlink written 3.5 years ago by Medhat8.5k
1
gravatar for edrezen
3.5 years ago by
edrezen720
France
edrezen720 wrote:

The issue occurs during the kmer counting algorithm of the GATB library; I think this bug has been corrected in the latest version of the library 1.2.0.

However, there are some API changes in the GATB library 1.2.0 so it is not usable as it with lordec.

The best shot you have right now is to patch the lordec-correct.cpp file at line 1511 (that begins by "graph = Graph::create...") by adding " -max-memory 8000" after "-nb-cores %d"; here 8000 means 8 Gbytes, so you could try 16000 or more (not too much) if your server has indeed a lot of memory.

Then you can compile again lordec-correct and see if you still have the issue.

ADD COMMENTlink written 3.5 years ago by edrezen720

Thanks a lot, This should be the right answer for my question, Details;

First I was using GATB 1.1.0 as suggested by the software installation document, when I tried to use GATB 1.2 It did not compile , So I followed The other suggestion -max-memory 20000. Till now it works fine.

ADD REPLYlink written 3.5 years ago by Medhat8.5k

I appreciate that you posted the question& answer, since I was at a loss.

However, this did not solve my problem& I still get the segmentation error though I have 24GB memory (-max-memory 22000). Is this how line 1511 should read?

graph = Graph::create (b, (const char *)"-kmer-size %d -abundance-min %d -bloom cache -debloom original -debloom-impl basic -nb-cores %d -max-memory 22000", kmer_len, solid_kmer_thr, threads);

I tried this with both gatb 1.0.6 and gatb 1.1.0, without success.

This is to correct an E. coli genome of about 5.3Mbp. The pacbio reads (as 9 contigs) are each under 2Mbp.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by jrchase0

for me I advice that you work with GATB 1.1.0 as suggested by the software and also try to use only 10000 so it would be like

graph = Graph::create (b, (const char *)"-kmer-size %d -abundance-min %d -bloom cache -debloom original -debloom-impl basic -nb-cores %d -max-memory 10000", kmer_len, solid_kmer_thr, threads);

recompile it again and run it.

ADD REPLYlink written 3.2 years ago by Medhat8.5k

thank you

sadly, neither 10000, 16000, nor 20000 for -max-memory was able to avoid the segmentation fault error.

any chance that LoRDEC will be updated to use the newer GATB library without this bug?

ADD REPLYlink written 3.2 years ago by jrchase0

I asked them about the bug, but there was no answer , maybe they will do it soon I have no I dea, so there is couple of solutions, but first how many threads you use?

ADD REPLYlink written 3.2 years ago by Medhat8.5k

I just use the default of all threads...

ADD REPLYlink written 3.2 years ago by jrchase0

change it to 1 or 2 , so we can use less rams

ADD REPLYlink written 3.2 years ago by Medhat8.5k

It appears that the length of the PacBio contigs is the issue. I broke the assembled unitigs into 5000 lines (at most) and LoRDEC was easily able to use short read (Roche454) data to correct an E. coli genome PacBio.

ADD REPLYlink written 3.2 years ago by jrchase0

you are correcting reads or contigs?

ADD REPLYlink written 3.2 years ago by Medhat8.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 776 users visited in the last hour