What Is The Most Memory-Efficient De Novo Assembler?
7
8
Entering edit mode
10.7 years ago
toshnam ▴ 650

Hi all,

I should assemble the hiseq2000 read set (558 million PE reads) on linux server which is consisted of 16 core and 128G RAM.

I've been thinking the SOAPdenovo is the most memory-efficient de novo assembler, but my server can't assemble using SOAPdenovo. I guess RAM capacity is not sufficient.

What is the most memory-efficient de novo assembler for eukaryote genome?

Thanks in advance.

assembly hiseq memory • 9.8k views
ADD COMMENT
5
Entering edit mode
10.7 years ago
Neilfws 49k

It's difficult to compare different assemblers in a fair, meaningful way.

However, Shen et al. made a good attempt and recently published: "A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies". If you can cope with the horrible 3D plots in their paper, Figures 2 and 3 indicate memory usage for 9 assemblers.

ADD COMMENT
4
Entering edit mode
10.7 years ago

CLC de-novo assembler. In their example, 21Gb RAM for 38X human.

ADD COMMENT
0
Entering edit mode

I second this, I was suprised how low the memory usage was - now it was a small bacterial genome of 4Gb but even that would make another approach use 10Gb ram whereas the CLC assembler was around 1Gb or less.

ADD REPLY
0
Entering edit mode

I second this, I've recently worked with it and I was surprised just how low the memory usage was - now it was a small bacterial genome but even that would make another approach use 10Gb ram whereas the CLC assembler was around 1Gb or less.

ADD REPLY
0
Entering edit mode

I am happy with CLC's performance. I also found it to give me best contig N50 compared to velvet/soap. However, a big issue for CLC denovo assembler is it doesn't do scaffolding, so I am stuck at small contigs. For the genomes I work with, I would like to grow the contigs to as large as possible.

ADD REPLY
3
Entering edit mode
10.7 years ago

One of the most important steps in limiting your RAM consumption is filtering your input data.

Every kmer that your dataset produces will take up space in the de Bruyn graph, and thus removing kmers that are created because of read errors will shrink your memory recuirements tremendously.

In our lab we could cut the used memory in half by filtering the input data.

We have used Jellyfish with some scripts of our own, but other packages available are Quake , khmer and some stuff from BGI.

ADD COMMENT
0
Entering edit mode

How did you go about extracting reads from the jellyfish output? Say you would want to ignore kmers with counts 1-6? I wrote something myself buts it is very slow. It's a shame helly loses read info from the kmers.

ADD REPLY
3
Entering edit mode
10.7 years ago
Benm ▴ 710

I think SOAPdenovo in short Paired-Ends reads denovo assembly perform well, 128GB for 558 million PE reads maybe sufficient to run SOAPdenovo, but most important thing is you need to do "Error Correction" before you run the programs of constructing contigs and scaffolding. After error correction process, that would be fine, and you will find it would cost less memory. There is the error correction tool in soapdenvo download website. And you can choose the third party contributions, such as Euler-SR, etc. If you reads are mixed set, there is a latest reference you may follow: Leena Salmela, Correction of sequencing errors in a mixed set of reads. Bioinformatics, Vol. 26 no. 10 2010, pages 1284–1290.

ADD COMMENT
1
Entering edit mode

Do you mean "Correction tool for SOAPdenovo (Version 20090703)" on the homepage (http://soap.genomics.org.cn/index.html)? I'm going to run "KmerFreq", "Corrector", "merge_pair.pl", and "merge_pair_list.pl" as your suggestion.

ADD REPLY
0
Entering edit mode

I've stopped using SOAPCorrector because KmerFreq crashes on my fastq read files ! I am actually trying Quake...

ADD REPLY
1
Entering edit mode
10.7 years ago
Kevin ▴ 640

http://kevin-gattaca.blogspot.com/2010/10/de-novo-assembly-of-large-genomes.html U did not mention your genome size but Cortex might be the kind of software that you are looking for if u do not have clc bio

ADD COMMENT
1
Entering edit mode
10.3 years ago
Shaldenby ▴ 10

I think that the CLC assembler is pretty much the leader at the moment

ADD COMMENT
0
Entering edit mode
10.7 years ago
Rm 8.1k

what about IDBA: A Practical Iterative De Bruijn Graph De Novo Assembler

http://code.google.com/p/hku-idba/

Any one worked with this tool?

ADD COMMENT

Login before adding your answer.

Traffic: 1153 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6