Hello everyone,
I have MAGs and their annotations generated using Prokka. I am interested in clustering genes based on their functional categories—such as cell division, amino acid metabolism, carbon fixation, etc.—to better understand the metabolic capabilities of these microbes.
However, Prokka provides primarily gene names and product descriptions but lacks detailed functional classifications. Additionally, many annotations include hundreds of hypothetical proteins, which makes analysis challenging.
I have performed metabolic reconstructions using METABOLIC, but I notice that in many articles involving MAGs, researchers use several tools for similar purposes. Therefore, I would appreciate recommendations on the best tools or workflows for metabolic reconstruction from MAGs, as well as strategies to interpret and analyze Prokka gene predictions effectively.
Thank you very much for any suggestions!
Best regards, Alla
Prokka is just an annotation pipeline and relies on/amalgamates predictions generated by a number of other tools - it doesn't do any more than decent annotation.
You can use tools like
CD-HITand evenroaryto cluster genes together into othologue groups, on which you can then do various gene ontology-type analyses.I'm personally a bit out of touch with the metagenome space to know what the best tools are these days but what you describe sounds like a clustering and annotation refinement kind of task.
If you're only interested in metabolic pathways, then you're probably already on the right track looking at what is commonly used in the literature at the moment.
I am still confused how to put all predicted genes into metabolic pathways, I used eggnog-mapper for the prokka output, and can't find a tool which just put those KO into KEGG pathways, I don't want to use KEGG-mapper because It's web tool, I need something for the cluster and that will give me a simple table with all full pathways in my MAGs...
In some articles people just create thier own piplines and custom things or say they used KEGG db. I have found - MinPath, but It's a minimal ammount of pathways, Also, KEGGDecoder, but it needs kofamscan outputs as input, Just looking for something like METABOLIC, but for prokka-eggnog output files!
I would appreciate any help. Thanks