Do you use a specific assembler that you would like to recommend? Is there a particular trend to use a specific class of assemblers (for example de Bruijn-based)? Is there an assembler that runs on 32bit OSs so that I can play in small scale (in my desktop) before going real scale (in the server)?
I think you can have a look to this link: http://seqanswers.com/forums/showthread.php?t=43
This an exhaustive list of Free and commercial solutions to perform NGS data assembly.
More specifically to the initial question, I agree with Eric, CLC Genomic Workbench is a very interesting integrated solution. Moreover, you can try MIRA3 (Linux, http://www.chevreux.org/mira_downloads.html).
Here http://bit.ly/9CLset is the last review about various nextgen assemblers.
If you plan on using Sanger sequencing and want to do some test runs then you may get real sequencing data including SCF files from: http://bit.ly/bkQCFG
Get some data for several species, run phrap or cap3 on them. To do this in GUI, use Staden or Consed. Keep in mind this route is close to being a thing of the past.
I've tried Arachne, Newbler and WGS on 454 datasets with varying results. In metagenomes from soil with many species and relative low coverage per species you will get more of your reads assembled into contigs that in metagenomes with a low amount of different species. In such a dataset contigs tend to break on variations between the different but similar species (WGS) or include ambiguities (Newbler). Binning may improve things. I'm curious about what options others tried.
Just happened to read this thread.Comment on CLC, it is damn expensive (while there are opne source alternatives), super fast (that was amazing) and does not use quality data so far (absolutely not acceptable, yeah qualities of NGS are quite good nowadays but still), and output information is minimal...
wgs and mira are for free, I prefer mira as it is highly tunable. and if you are lucky you can get newbler for free when you use 454 technology
We are using a non-free solution, the CLC Genomic Workbench. This software has MANY capabilities. In the role you are asking about, it would easily assemble millions and millions of short reads (given enough RAM, and, of course, a 64bit system to use it). You can easily put in data from different taxa, specify different criteria for the assembly, or alternatively, use a reference genome to assemble your data on, so as to potentially get a less messy result (less influenced by sequence divergence, paralogy...).
This software could be somewhat pricey for a small lab, but in the context of a group of research, I have found that it was many times worth it's price just in time saved on student projects.
The software also has A LOT of features for biologists working with sequences. Not only assembling NGS data.
DISCLAIMER (just in case...): I am IN NO WAY connected to this company. I just happen to be a happy user :)
I never tried it, but there's also the Genovo de novo assembler, specifically designed for assembling metagenomes, which interestingly uses a bayesian approach.
They compared it against Velvet, EULER-SR, and Newbler, and it seems to have performed better.
From their abstract in their manual, which can be found on the Genovo link:
We compare the performance of Genovo to three other short read assembly programs across one synthetic dataset and eight metagenomic datasets created using the 454 platform, the largest of which has 311k reads. Genovo’s reconstructions cover more bases and recover more genes than the other methods, and yield a higher assembly
But maybe these assemblers aren't the best comparison in terms of metagenomics assembly performance.
If you try it let me know how it performs.
Metagenome assembly is different from genome or transcriptome assembly, because of the differences in counts of various samples (http://www.homolog.us/Tutorials/index.php?p=6.6&s=1).
Regarding the programs involved, most researchers I know, who are doing metagenome assembly every day, currently use Ray-Meta for its scalability to large samples. That is just an anecdotal observation and not a recommendation for one program versus another.
In this paper they use Newbler to assemble the reads and they even got strain specificity http://www.pnas.org/content/108/3/1128 If you're working (or plan to work) with 454 data maybe you could try first with Newbler, it's quite easy to use
...The AMOS package is very useful....in their publication, the Minimo assembler pipeline can be used for metagenomics assembly.
But another package called metAMOS (https://github.com/treangen/metAMOS) heavily dependent on AMOS, SOAP,Newbler and other tools is available and looks promising.
I am also curious to know the assembly program for the metagenomic samples. Most of the program i know like euler, velvet, arachne etc are designed for assembling genome from single species. Can these genome assembler be used for metagenome assembly of illumina reads? However i know about metasim but i am not using it for now.