Question: Which Assembler To Use For Metagenomic Sequences?
9
gravatar for Panos
8.9 years ago by
Panos1.6k
Geneva, Switzerland
Panos1.6k wrote:

Do you use a specific assembler that you would like to recommend? Is there a particular trend to use a specific class of assemblers (for example de Bruijn-based)? Is there an assembler that runs on 32bit OSs so that I can play in small scale (in my desktop) before going real scale (in the server)?

assembly metagenomics • 12k views
ADD COMMENTlink modified 5.5 years ago by ugly.betty771.0k • written 8.9 years ago by Panos1.6k
1

you should use 'short-reads assembler', assembler in computer science has a different meaning.

ADD REPLYlink written 8.9 years ago by Giovanni M Dall'Olio26k

To answer what assembler you should use, we really need more information.

What kind of data do you have ?

How many organisms/species are in the sample you want to sequence ?

What do you want to do with the resulting assemblies ?

ADD REPLYlink written 8.9 years ago by Panos1.6k

To begin with, I haven't worked with genome assembly before and I'm just trying to understand how the various tools in a metagenomics workflow work...

At present, I don't have the actual data and I'm still 'playing' with simulated datasets generated by MetaSim that contain only bacterial sequences (both Sanger and 454). In the next stage I'll add some fungi, too. Regarding the number of species I've started with only 2 bacterial genomes! Last, the end target is to perform gene calling, taxonomic profiling etc

ADD REPLYlink written 8.9 years ago by Panos1.6k

@Jan van Haarst Hope I did the comments as you told me! If I didn't, let me know! Thank you for your time!

ADD REPLYlink written 8.9 years ago by Panos1.6k

Edit your post and add this information into the post.

ADD REPLYlink written 8.9 years ago by Istvan Albert ♦♦ 79k

Edit your post (there is a link to edit it) then add this information into the question.

ADD REPLYlink written 8.9 years ago by Istvan Albert ♦♦ 79k

have a look at this other question: http://biostar.stackexchange.com/questions/137/what-methods-do-you-use-for-short-read-mapping

ADD REPLYlink written 8.9 years ago by Giovanni M Dall'Olio26k
6
gravatar for Bioch'Ti
8.8 years ago by
Bioch'Ti1000
France (Avignon)
Bioch'Ti1000 wrote:

Hi Guys,

I think you can have a look to this link: http://seqanswers.com/forums/showthread.php?t=43

This an exhaustive list of Free and commercial solutions to perform NGS data assembly.

More specifically to the initial question, I agree with Eric, CLC Genomic Workbench is a very interesting integrated solution. Moreover, you can try MIRA3 (Linux, http://www.chevreux.org/mira_downloads.html).

Regards.

ADD COMMENTlink written 8.8 years ago by Bioch'Ti1000
4
gravatar for Darked89
8.9 years ago by
Darked894.2k
Barcelona, Spain
Darked894.2k wrote:

Here http://bit.ly/9CLset is the last review about various nextgen assemblers.

If you plan on using Sanger sequencing and want to do some test runs then you may get real sequencing data including SCF files from: http://bit.ly/bkQCFG

Get some data for several species, run phrap or cap3 on them. To do this in GUI, use Staden or Consed. Keep in mind this route is close to being a thing of the past.

ADD COMMENTlink written 8.9 years ago by Darked894.2k
1

Hi darked89, your first link no longer seems to work - do you remember where it was supposed to point? Thanks.

ADD REPLYlink written 8.3 years ago by Bio_X2Y3.6k
1

@Bio_X2Y: sorry about the dead link. It was most likely: Assembly algorithms for next-generation sequencing data Jason R. Miller,Sergey Korena and Granger Sutton Genomics Volume 95, Issue 6, June 2010, Pages 315-327doi:10.1016/j.ygeno.2010.03.001

ADD REPLYlink written 8.3 years ago by Darked894.2k

Which route is close to being a thing of the past? Using the specific programs or the assembly itself?

ADD REPLYlink written 8.9 years ago by Panos1.6k

thing getting into "outdated" zone: Sanger sequencing (SS) for large projects + pipelines used for processing such data. IMHO it is still great to do some DNA quality checking using SS before loading your DNA in Illumina/454 or for assembly finishing/improvement. The 454s or Illumina's paired reads will give you way more data.

ADD REPLYlink written 8.9 years ago by Darked894.2k

Thanks for the update!

ADD REPLYlink written 8.3 years ago by Bio_X2Y3.6k

@darked89: thanks for the update!

ADD REPLYlink written 8.3 years ago by Bio_X2Y3.6k
4
gravatar for Blackbox
8.9 years ago by
Blackbox40
Blackbox40 wrote:

I've tried Arachne, Newbler and WGS on 454 datasets with varying results. In metagenomes from soil with many species and relative low coverage per species you will get more of your reads assembled into contigs that in metagenomes with a low amount of different species. In such a dataset contigs tend to break on variations between the different but similar species (WGS) or include ambiguities (Newbler). Binning may improve things. I'm curious about what options others tried.

ADD COMMENTlink written 8.9 years ago by Blackbox40
3
gravatar for Cyz70
8.4 years ago by
Cyz7030
Cyz7030 wrote:

Just happened to read this thread.Comment on CLC, it is damn expensive (while there are opne source alternatives), super fast (that was amazing) and does not use quality data so far (absolutely not acceptable, yeah qualities of NGS are quite good nowadays but still), and output information is minimal...

wgs and mira are for free, I prefer mira as it is highly tunable. and if you are lucky you can get newbler for free when you use 454 technology

ADD COMMENTlink written 8.4 years ago by Cyz7030

hey, have you tried mira for metagenomic data? which kind of data did you have (Illumina, 454...)? I've used it for genome assembly but I don't know how it works with metagenomes, Thanks!

ADD REPLYlink written 8.0 years ago by Marina Manrique1.3k
2
gravatar for Eric Normandeau
8.9 years ago by
Quebec, Canada
Eric Normandeau10k wrote:

We are using a non-free solution, the CLC Genomic Workbench. This software has MANY capabilities. In the role you are asking about, it would easily assemble millions and millions of short reads (given enough RAM, and, of course, a 64bit system to use it). You can easily put in data from different taxa, specify different criteria for the assembly, or alternatively, use a reference genome to assemble your data on, so as to potentially get a less messy result (less influenced by sequence divergence, paralogy...).

This software could be somewhat pricey for a small lab, but in the context of a group of research, I have found that it was many times worth it's price just in time saved on student projects.

The software also has A LOT of features for biologists working with sequences. Not only assembling NGS data.

DISCLAIMER (just in case...): I am IN NO WAY connected to this company. I just happen to be a happy user :)

Cheers.

ADD COMMENTlink modified 8.8 years ago • written 8.9 years ago by Eric Normandeau10k

You can also buy just the terminal program clc_novo_assemble. I have version 3.0.2b which has SIMD instructions meaning it's very, very fast. My only beef is that it doesn't give you any coverage information but only a fasta file.

ADD REPLYlink written 8.8 years ago by Science_Robot1.1k

(It also supports paired ends for Illumina data)

ADD REPLYlink written 8.8 years ago by Science_Robot1.1k
2
gravatar for Manu Prestat
7.2 years ago by
Manu Prestat3.9k
Marseille, France
Manu Prestat3.9k wrote:

A "meta" version of Velvet is newly available. I did not try it yet. http://metavelvet.dna.bio.keio.ac.jp/

ADD COMMENTlink written 7.2 years ago by Manu Prestat3.9k
2
gravatar for Random
7.2 years ago by
Random160
Random160 wrote:

I never tried it, but there's also the Genovo de novo assembler, specifically designed for assembling metagenomes, which interestingly uses a bayesian approach.

They compared it against Velvet, EULER-SR, and Newbler, and it seems to have performed better.

From their abstract in their manual, which can be found on the Genovo link:

We compare the performance of Genovo to three other short read assembly programs across one synthetic dataset and eight metagenomic datasets created using the 454 platform, the largest of which has 311k reads. Genovo’s reconstructions cover more bases and recover more genes than the other methods, and yield a higher assembly

But maybe these assemblers aren't the best comparison in terms of metagenomics assembly performance.

If you try it let me know how it performs.

ADD COMMENTlink modified 5 months ago by RamRS20k • written 7.2 years ago by Random160
2
gravatar for ugly.betty77
5.5 years ago by
ugly.betty771.0k
United States
ugly.betty771.0k wrote:

Metagenome assembly is different from genome or transcriptome assembly, because of the differences in counts of various samples (http://www.homolog.us/Tutorials/index.php?p=6.6&s=1).

Regarding the programs involved, most researchers I know, who are doing metagenome assembly every day, currently use Ray-Meta for its scalability to large samples. That is just an anecdotal observation and not a recommendation for one program versus another.

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by ugly.betty771.0k
1

Ray Meta is also recommended in this useful tutorial: http://perso.eleves.bretagne.ens-cachan.fr/~chikhi/2013-evomics-assembly.pdf

ADD REPLYlink written 5.5 years ago by Mikael Huss4.6k
1
gravatar for Marina Manrique
8.0 years ago by
Marina Manrique1.3k
Granada
Marina Manrique1.3k wrote:

In this paper they use Newbler to assemble the reads and they even got strain specificity http://www.pnas.org/content/108/3/1128 If you're working (or plan to work) with 454 data maybe you could try first with Newbler, it's quite easy to use

ADD COMMENTlink written 8.0 years ago by Marina Manrique1.3k
1
gravatar for Urchgene
7.2 years ago by
Urchgene10
Urchgene10 wrote:

...The AMOS package is very useful....in their publication, the Minimo assembler pipeline can be used for metagenomics assembly.

But another package called metAMOS (https://github.com/treangen/metAMOS) heavily dependent on AMOS, SOAP,Newbler and other tools is available and looks promising.

ADD COMMENTlink written 7.2 years ago by Urchgene10
1
gravatar for Martin A Hansen
7.2 years ago by
Martin A Hansen3.0k
Denmark
Martin A Hansen3.0k wrote:

MetaIDBA works really well.

ADD COMMENTlink written 7.2 years ago by Martin A Hansen3.0k
0
gravatar for Rks
8.8 years ago by
Rks20
European Union
Rks20 wrote:

I am also curious to know the assembly program for the metagenomic samples. Most of the program i know like euler, velvet, arachne etc are designed for assembling genome from single species. Can these genome assembler be used for metagenome assembly of illumina reads? However i know about metasim but i am not using it for now.

ADD COMMENTlink written 8.8 years ago by Rks20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 965 users visited in the last hour