Question

WGS bacterial metagenomic pipeline - group comparison

0

Entering edit mode

2.7 years ago

predeus ★ 1.9k

Hello all,

I was wondering what's the recent state-of-the-art pipeline to do metagenome comparison, with multiple samples and groups? E.g. I have several "patients" and several "controls" in the form of WGS reads, and I want to run a comparison (on some level of taxonomical resolution) to identify the species or other taxonomic clusters that are most different between them.

I'm not experienced in metagenomics at all, this is a random project I wanted to try. Thus a few sentence intro about the state of things in the field would be much appreciated :)

metagenomics WGS pipeline bacterial • 948 views

ADD COMMENT • link 2.6 years ago by predeus ★ 1.9k

score 1 · Answer 1 · 2021-08-22

What I am suggesting is not necessarily a complete pipeline, but it should get the job done.

It starts with metagenome binning, for which you can use MetaBAT, CONCOCT, VizBin, or one of more recent packages such as Vamb. The output may look something like this:

enter image description here

Next you assess the quality of those bins using CheckM, which may look like this:

  Bin Id               Marker lineage            # genomes   # markers   # marker sets    0     1     2    3    4    5+   Completeness   Contamination   Strain heterogeneity
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  group_001         k__Bacteria (UID203)            5449        104            58         6     81    17   0    0    0       94.36            5.80               5.88
  group_003         k__Bacteria (UID203)            5449        104            58         4     12    46   42   0    0       93.97            7.19              19.77
  group_015         k__Bacteria (UID2495)           2993        147            91         6    133    8    0    0    0       93.96            6.59               0.00

Next step is taxonomic assignment, which can be done using GTDB Toolkit an may look like this:

user_genome     classification
group_001       d__Bacteria;p__Acidobacteriota;c__Aminicenantia;o__UBA2199;f__UBA2199;g__UBA2199;s__
group_003       d__Bacteria;p__Acidobacteriota;c__Aminicenantia;o__Aminicenantales;f__RBG-16-66-30;g__;s__
group_015       d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Steroidobacterales;f__Steroidobacteraceae;g__RPQJ01;s__

Or if you just want to visualize the differences, you can bin the two groups together in the first step and separate them into two plots:

enter image description here