Question: What Is The Most Memory-Efficient De Novo Assembler?
8
gravatar for toshnam
8.0 years ago by
toshnam620
Seoul, Republic of Korea
toshnam620 wrote:

Hi all,

I should assemble the hiseq2000 read set (558 million PE reads) on linux server which is consisted of 16 core and 128G RAM.

I've been thinking the SOAPdenovo is the most memory-efficient de novo assembler, but my server can't assemble using SOAPdenovo. I guess RAM capacity is not sufficient.

What is the most memory-efficient de novo assembler for eukaryote genome?

Thanks in advance.

assembly memory hiseq • 8.3k views
ADD COMMENTlink modified 8.0 years ago by Shaldenby10 • written 8.0 years ago by toshnam620
5
gravatar for Neilfws
8.0 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

It's difficult to compare different assemblers in a fair, meaningful way.

However, Shen et al. made a good attempt and recently published: "A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies". If you can cope with the horrible 3D plots in their paper, Figures 2 and 3 indicate memory usage for 9 assemblers.

ADD COMMENTlink written 8.0 years ago by Neilfws48k
4
gravatar for Haibao Tang
8.0 years ago by
Haibao Tang3.0k
Mountain View, CA
Haibao Tang3.0k wrote:

CLC de-novo assembler. In their example, 21Gb RAM for 38X human.

ADD COMMENTlink written 8.0 years ago by Haibao Tang3.0k

I second this, I was suprised how low the memory usage was - now it was a small bacterial genome of 4Gb but even that would make another approach use 10Gb ram whereas the CLC assembler was around 1Gb or less.

ADD REPLYlink written 7.6 years ago by Istvan Albert ♦♦ 80k

I second this, I've recently worked with it and I was surprised just how low the memory usage was - now it was a small bacterial genome but even that would make another approach use 10Gb ram whereas the CLC assembler was around 1Gb or less.

ADD REPLYlink written 7.6 years ago by Istvan Albert ♦♦ 80k

I am happy with CLC's performance. I also found it to give me best contig N50 compared to velvet/soap. However, a big issue for CLC denovo assembler is it doesn't do scaffolding, so I am stuck at small contigs. For the genomes I work with, I would like to grow the contigs to as large as possible.

ADD REPLYlink written 7.6 years ago by Haibao Tang3.0k
3
gravatar for Jan Van Haarst
8.0 years ago by
Wageningen, NL
Jan Van Haarst300 wrote:

One of the most important steps in limiting your RAM consumption is filtering your input data.

Every kmer that your dataset produces will take up space in the de Bruyn graph, and thus removing kmers that are created because of read errors will shrink your memory recuirements tremendously.

In our lab we could cut the used memory in half by filtering the input data.

We have used Jellyfish with some scripts of our own, but other packages available are Quake , khmer and some stuff from BGI.

ADD COMMENTlink written 8.0 years ago by Jan Van Haarst300

How did you go about extracting reads from the jellyfish output? Say you would want to ignore kmers with counts 1-6? I wrote something myself buts it is very slow. It's a shame helly loses read info from the kmers.

ADD REPLYlink written 7.7 years ago by Louis Letourneau790
3
gravatar for Benm
8.0 years ago by
Benm710
Benm710 wrote:

I think SOAPdenovo in short Paired-Ends reads denovo assembly perform well, 128GB for 558 million PE reads maybe sufficient to run SOAPdenovo, but most important thing is you need to do "Error Correction" before you run the programs of constructing contigs and scaffolding. After error correction process, that would be fine, and you will find it would cost less memory. There is the error correction tool in soapdenvo download website. And you can choose the third party contributions, such as Euler-SR, etc. If you reads are mixed set, there is a latest reference you may follow: Leena Salmela, Correction of sequencing errors in a mixed set of reads. Bioinformatics, Vol. 26 no. 10 2010, pages 1284–1290.

ADD COMMENTlink written 8.0 years ago by Benm710
1

Do you mean "Correction tool for SOAPdenovo (Version 20090703)" on the homepage (http://soap.genomics.org.cn/index.html)? I'm going to run "KmerFreq", "Corrector", "merge_pair.pl", and "merge_pair_list.pl" as your suggestion.

ADD REPLYlink written 8.0 years ago by toshnam620

I've stopped using SOAPCorrector because KmerFreq crashes on my fastq read files ! I am actually trying Quake...

ADD REPLYlink written 7.3 years ago by Frédéric Bigey280
1
gravatar for Kevin
8.0 years ago by
Kevin610
Kevin610 wrote:

http://kevin-gattaca.blogspot.com/2010/10/de-novo-assembly-of-large-genomes.html U did not mention your genome size but Cortex might be the kind of software that you are looking for if u do not have clc bio

ADD COMMENTlink written 8.0 years ago by Kevin610
1
gravatar for Shaldenby
7.6 years ago by
Shaldenby10
Cambridge, United Kingdom
Shaldenby10 wrote:

I think that the CLC assembler is pretty much the leader at the moment

ADD COMMENTlink written 7.6 years ago by Shaldenby10
0
gravatar for Rm
8.0 years ago by
Rm7.8k
Danville, PA
Rm7.8k wrote:

what about IDBA: A Practical Iterative De Bruijn Graph De Novo Assembler

http://code.google.com/p/hku-idba/

Any one worked with this tool?

ADD COMMENTlink written 8.0 years ago by Rm7.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1212 users visited in the last hour