15 months ago
predeus ★ 1.8k

Hello all,

I was wondering what's the recent state-of-the-art pipeline to do metagenome comparison, with multiple samples and groups? E.g. I have several "patients" and several "controls" in the form of WGS reads, and I want to run a comparison (on some level of taxonomical resolution) to identify the species or other taxonomic clusters that are most different between them.

I'm not experienced in metagenomics at all, this is a random project I wanted to try. Thus a few sentence intro about the state of things in the field would be much appreciated :)

15 months ago
Mensur Dlakic ★ 21k

What I am suggesting is not necessarily a complete pipeline, but it should get the job done.

It starts with metagenome binning, for which you can use MetaBAT, CONCOCT, VizBin, or one of more recent packages such as Vamb. The output may look something like this:

Next you assess the quality of those bins using CheckM, which may look like this:

  Bin Id               Marker lineage            # genomes   # markers   # marker sets    0     1     2    3    4    5+   Completeness   Contamination   Strain heterogeneity
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
group_001         k__Bacteria (UID203)            5449        104            58         6     81    17   0    0    0       94.36            5.80               5.88
group_003         k__Bacteria (UID203)            5449        104            58         4     12    46   42   0    0       93.97            7.19              19.77
group_015         k__Bacteria (UID2495)           2993        147            91         6    133    8    0    0    0       93.96            6.59               0.00


Next step is taxonomic assignment, which can be done using GTDB Toolkit an may look like this:

user_genome     classification
group_001       d__Bacteria;p__Acidobacteriota;c__Aminicenantia;o__UBA2199;f__UBA2199;g__UBA2199;s__
group_003       d__Bacteria;p__Acidobacteriota;c__Aminicenantia;o__Aminicenantales;f__RBG-16-66-30;g__;s__
group_015       d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Steroidobacterales;f__Steroidobacteraceae;g__RPQJ01;s__


Or if you just want to visualize the differences, you can bin the two groups together in the first step and separate them into two plots:

0
Thank you very much - much appreciated! I came across MetaWRAP, which seems to be doing all the things needed in (more or less) a one go: https://github.com/bxlab/metaWRAP

The t-SNE/PCA is a good idea though, definitely worth doing even if only for a pretty picture.

