Hello 👋🏼
I am a computational biology PhD student, and I just wanted to share a metagenomics data analysis pipeline that some users may find useful. It is called metaGEM
💎 and the main idea was to create a reproducible workflow that generates high quality metagenome assembled genomes (MAGs), from which genome scale metabolic models (GEMs) can be reconstructed. Communities of GEMs can then be simulated to predict metabolic interactions within metagenomic samples.
In our pre-print, we apply metaGEM
to samples from small lab cultures, human gut, plant-associated, bulk soil, and ocean metagenomes. Through pangenome analysis and species metabolic interaction analysis, we showed that the workflow generates phenotype-relevant and context-specific models. I have also applied metaGEM
to kefir cultures, at-home gut microbiome sequencing test kits, and fossilized human poop with great success!
Although the workflow is meant to run on a high performance computer cluster (largely due to computational resources required for assembly), many steps can be run on a standard laptop. You can find code, documentation, and tutorials on GitHub! I am always looking for ways to improve the workflow's performance, accessibility, user experience, etc. so please let me know if you have any suggestions!