Assembly of whole metagenomics data
4
0
Entering edit mode
9.5 years ago

Hi all,

I want to know about the best assembly tool for whole metagenomic illumina paired end data of 101bp reads and what are the parameters to check best assembly for metagenomics data.

Assembly • 4.0k views
ADD COMMENT
0
Entering edit mode

I used SOAPdenovo for metagenomic assembly but never found the good result. I tried it on the a set of HMP samples to reassemble them but I could not get as good assembly compared to HMP assemblies. How is it work with you?

ADD REPLY
4
Entering edit mode
9.5 years ago

I've found that Ray (Ray Meta to be specific) works well for me in this type of scenario. Also IDBA-UD and SPAdes have given consistently good results.

ADD COMMENT
1
Entering edit mode
9.5 years ago
iraun 6.2k

If you read this thread Denovo Assembly Of Paired And Mate Paired Reads , maybe you can get an idea about the starting point ... I suggest you to start with SOAPdenovo, I'm working in a metagenomics pipe and after reading different papers in the literature this is the tool I've chosen.

Hope it helps.

ADD COMMENT
0
Entering edit mode

I used SOAPdenovo for metagenomic assembly but never found the good result. I tried it on the a set of HMP samples to reassemble them but I could not get as good assembly comapred to HMP assemblies. How is it work with you?

ADD REPLY
0
Entering edit mode
9.5 years ago
5heikki 11k

In our case, Meta-IDBA worked relatively well with LR-trimmed 100 bp pe reads. I recall k-range was something like 20-75.

ADD COMMENT
0
Entering edit mode
9.5 years ago
Skeletor ▴ 90

I would recommend playing around with some simulated data using a tool such as metasim in order to benchmark some of the assemblers out there.

There is a tool called metaquast that will help you evaluate your assemblies.

Unfortunately it is hard to evaluate metagenome assemblies on real world data as we don't know what should be in our samples. Metrics such as the N50 size aren't meaningful if your assembly isn't accurate. This is especially problematic for metagenomes as you are dealing with multiple organisms in your sample. Parameter optimization is also tricky due to having multiple genomes with different coverage levels. For example, you might find that using a certain k-mer value is good for the genomes with higher coverage but not those with lower.

ADD COMMENT

Login before adding your answer.

Traffic: 2927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6