Question: Best software to assemble bacterial genomes
11
gravatar for fhsantanna
4.6 years ago by
fhsantanna440
Brazil
fhsantanna440 wrote:

I have sequencing data of five bacteria, which were generated using Illumina MiSeq. Four of them were sequenced using a paired-end 2x300 protocol and one was sequenced using the nextera mate-pair protocol.

My question is: What are the softwares that you recommend me to assemble these genomes (the largest has almost 8 Mbp)?

I have access to a CLC Workbench. It seems quite ease to use, but I dont know if it is the best one. Most of papers that I found that evaluate the performance of assemblers from two-three years ago.

I also have to mention that I have two i7 with 8 threads PCs available for this objective (one with 32 and another with 8 Gb RAM).

Thanks in advance.

ADD COMMENTlink modified 16 months ago by Biostar ♦♦ 20 • written 4.6 years ago by fhsantanna440
1

Not exactly what you're looking for, but this guide was very useful for me: Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by PoGibas4.8k

I already read this paper. Very good. They recommended Velvet, but I believe that there are better options. Thank you anyway.

ADD REPLYlink written 4.6 years ago by fhsantanna440

I like using CLC Workbench. Even if you don't use their assembler you can do the last steps in CLC, it's much more convenient. Try SPAdes, SOAPdenovo and you can compare it to CLC built-in assembler.

ADD REPLYlink written 4.6 years ago by marina.v.yurieva480
8
gravatar for iraun
4.6 years ago by
iraun3.6k
Norway
iraun3.6k wrote:

You should read in the literature to know which one is the best one for you specific data. Here you have a nice paper comparing some assembly tools, and it is a recent paper (2014): http://genomebiology.com/2014/15/3/R42

In my opinion SOAPdenovo2 and SGA are a good choice. Bambus is quite difficult to install and to understand. SPACE also is nice, but if you want to use the last version you have to pay so...

Hope it helps.

ADD COMMENTlink written 4.6 years ago by iraun3.6k
8
gravatar for rtliu
4.6 years ago by
rtliu2.0k
New Zealand
rtliu2.0k wrote:

For bacterial genome, GAGE-B paper (2013) compare 8 genome assemblers:

  •     ABySS v1.3.4
  •     CABOG v7.0
  •     MIRA v3.4.0
  •     MSRCA v1.8.3
  •     SGA v0.9.34
  •     SOAPdenovo2 v2.04 + GapCloser v1.12
  •     SPAdes v2.3.0
  •     Velvet v1.2.08

All GAGE-B data and assembly recipe are available at http://ccb.jhu.edu/gage_b/index.html

 

For more recent comparson of genome assemblers, have a look at http://nucleotid.es/

As each bacteria genome size and GC% is different, you need to check these reproducible Benchmarks

 

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by rtliu2.0k
1

I will second the recommendation of http://nucleotid.es/

It compares numerous assemblers on microbial genomes, with objective metrics as reported by Quast (a tool for evaluating assemblies).  And it includes both peak memory usage and CPU-time.

ADD REPLYlink written 4.6 years ago by Brian Bushnell16k
9
gravatar for lexnederbragt
4.6 years ago by
lexnederbragt1.2k
Oslo, Norway
lexnederbragt1.2k wrote:

All the articles mentioned conclude with that there is no single best assembler for bacterial genomes. It depends on the genome and the data. So, you'll have to try a few, then validate them using tools such as FRCBam, REAPR or one of the likelihood methods. If you don't care about all this, use SPAdes. If you want a tool that automates most of this, look at iMetAMOS www.cbcb.umd.edu/software/imetamos

ADD COMMENTlink written 4.6 years ago by lexnederbragt1.2k

Yes, SPAdes performs very well and it's robust: I would reccomend using the --careful option which, according to the nucleotid.es benchmarks reduces the errors while keeping the same N50.

ADD REPLYlink written 4.5 years ago by mgalactus720
4
gravatar for moorem
4.6 years ago by
moorem220
United Kingdom
moorem220 wrote:

As mentioned SPAdes is great or check out the A5 Assembly pipeline. Following the full GAGE-B paper it has produced better QUAST results than SPAdes for MiSeq data. A lot depends upon your organism, how repetitive, GC content etc. 


http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0042304

EDIT: Also, check out Abacas for scaffolding if you have a closely related reference genome. 

http://abacas.sourceforge.net/

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by moorem220
1
gravatar for HG
4.6 years ago by
HG1.1k
Germany
HG1.1k wrote:

I would like to add Spades may be better choice for bacterial genome assembly. 

ADD COMMENTlink written 4.6 years ago by HG1.1k

SPAdes works well if you have uneven read lengths

ADD REPLYlink written 2.6 years ago by arya10
1
gravatar for dago
4.5 years ago by
dago2.5k
Germany
dago2.5k wrote:

Check out this work that comapre different assembly tools. They introduce a new tool, QUAST, to check the quality of the assembly.

ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by dago2.5k
0
gravatar for Whoknows
4.6 years ago by
Whoknows740
Tehran,Iran
Whoknows740 wrote:

You can also  try GATK package, it has numerous features for genome analysis.

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Whoknows740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2190 users visited in the last hour