Question: Metagenomics assembly comparison
0
gravatar for ARich
5 weeks ago by
ARich60
United States
ARich60 wrote:

Dear Biostar user,

I have a question regarding metagenomics assemby stats. I ran megahit and metaSpades on one sample (paired end) Then I ran metaquast to test out of these two assemblers which one is providing better statistics.

Currently the output is bit confusing to know which one i should choose for all samples.
Below are tables with some of metaquast results

num_contigs.xlsx

| Assemblies    | megahit   | SPAdes    |
|---------------------------------- |:-------:  |-------:   |
| Bacteroides_acidifaciens  | 179   | 112   |
| Dorea_sp._5_2     | 107   | 78    |
| Lactobacillus_johnsonii   | 24    | 14    |
| Lactobacillus_johnsonii_DPC_6026  | 26    | 21    |
| Lactobacillus_johnsonii_FI9785    | 13    | 6     |
| Lactobacillus_murinus     | 20    | 11    |
| Lactobacillus_reuteri     | 38    | 23    |
| Lactobacillus_reuteri_TD1     | 27    | 12    |


Misassembled_contigs_length

| Assemblies                       | megahit | SPAdes  |
|----------------------------------|---------|---------|
| Bacteroides_acidifaciens         | 2082038 | 1727134 |
| Dorea_sp._5_2                    | 173212  | 129559  |
| Lactobacillus_johnsonii          | 208152  | 153022  |
| Lactobacillus_johnsonii_DPC_6026 | 231804  | 205378  |
| Lactobacillus_johnsonii_FI9785   | 126758  | 96387   |
| Lactobacillus_murinus            | 24519   | 12355   |
| Lactobacillus_reuteri            | 93062   | 50979   |
| Lactobacillus_reuteri_TD1        | 67949   | 39834   |


Largest_contig.txt

| Assemblies                       | megahit | SPAdes |
|----------------------------------|:-------:|-------:|
| Bacteroides_acidifaciens         |  349575 | 117688 |
| Dorea_sp._5_2                    |  150971 | 199679 |
| Lactobacillus_johnsonii          |  46811  |  54841 |
| Lactobacillus_johnsonii_DPC_6026 | 46811   | 54841  |
| Lactobacillus_johnsonii_FI9785   | 46811   | 54841  |
| Lactobacillus_murinus            | 6067    | 5492   |
| Lactobacillus_reuteri            | 26761   | 25341  |
| Lactobacillus_reuteri_TD1        | 26761   | 13863  |
| not_aligned                      | 244235  | 132156 |


I think megahit performed better in term on contig length but i need a feedback from expert.

Looking forward for some feedback! Thank you in advance!

assembly • 112 views
ADD COMMENTlink modified 5 weeks ago by h.mon25k • written 5 weeks ago by ARich60
1

Unfortunately, your question is not that simple to answer. For contig length, the "best" assembler depends on the species. However, for misassembled contig length, spades perform better than megahit for all species, so looking at just these two metrics, I would say spades is better. However, you should evaluate other quality metrics, such as number of genes annotated, percentage of reads mapping back to the metagenomes, and so on. Two papers to help you out:

Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software

Comparing and Evaluating Metagenome Assembly Tools from a Microbiologist’s Perspective - Not Only Size Matters!

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by h.mon25k

Thank you for these paper. They were really helpful. I have another question regarding workflows. I am always confused with the workflow because each one has its own. Can you suggest me something more standard in term of taxonomic and functional classification.

I have understood two workflows by going through the literature. Workflow1: QC --> Contamination removal --> Assembly --> Remapping to get coverage and gene prediction (prodigal) --> Binning (Maxbin, CONCOCT) --> Taxonomic classification (Kraken,motu, metaphlan2) and Functional classification (humann2)

Workflow2: QC --> Contamination removal --> Taxonomic and functional classification.

My question is for workflow 1 why do we do binning? And can you suggest something for functional profiling? I am not clear about functional classification? what are the input which tools etc

For workflow2: Can we do binning directly in after contamination removal and then perform classification or this is the normal way?

Thank you in advance

ADD REPLYlink written 9 days ago by ARich60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 811 users visited in the last hour