WGS bacterial metagenomic pipeline - group comparison
1
0
Entering edit mode
2.7 years ago
predeus ★ 1.9k

Hello all,

I was wondering what's the recent state-of-the-art pipeline to do metagenome comparison, with multiple samples and groups? E.g. I have several "patients" and several "controls" in the form of WGS reads, and I want to run a comparison (on some level of taxonomical resolution) to identify the species or other taxonomic clusters that are most different between them.

I'm not experienced in metagenomics at all, this is a random project I wanted to try. Thus a few sentence intro about the state of things in the field would be much appreciated :)

metagenomics WGS pipeline bacterial • 945 views
ADD COMMENT
1
Entering edit mode
2.7 years ago
Mensur Dlakic ★ 27k

What I am suggesting is not necessarily a complete pipeline, but it should get the job done.

It starts with metagenome binning, for which you can use MetaBAT, CONCOCT, VizBin, or one of more recent packages such as Vamb. The output may look something like this:

enter image description here

Next you assess the quality of those bins using CheckM, which may look like this:

  Bin Id               Marker lineage            # genomes   # markers   # marker sets    0     1     2    3    4    5+   Completeness   Contamination   Strain heterogeneity
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  group_001         k__Bacteria (UID203)            5449        104            58         6     81    17   0    0    0       94.36            5.80               5.88
  group_003         k__Bacteria (UID203)            5449        104            58         4     12    46   42   0    0       93.97            7.19              19.77
  group_015         k__Bacteria (UID2495)           2993        147            91         6    133    8    0    0    0       93.96            6.59               0.00

Next step is taxonomic assignment, which can be done using GTDB Toolkit an may look like this:

user_genome     classification
group_001       d__Bacteria;p__Acidobacteriota;c__Aminicenantia;o__UBA2199;f__UBA2199;g__UBA2199;s__
group_003       d__Bacteria;p__Acidobacteriota;c__Aminicenantia;o__Aminicenantales;f__RBG-16-66-30;g__;s__
group_015       d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Steroidobacterales;f__Steroidobacteraceae;g__RPQJ01;s__

Or if you just want to visualize the differences, you can bin the two groups together in the first step and separate them into two plots:

enter image description here

ADD COMMENT
0
Entering edit mode

Thank you very much - much appreciated! I came across MetaWRAP, which seems to be doing all the things needed in (more or less) a one go: https://github.com/bxlab/metaWRAP

The t-SNE/PCA is a good idea though, definitely worth doing even if only for a pretty picture.

ADD REPLY

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6