Hello everyone,
I have MAGs and their annotations generated using Prokka. I am interested in clustering genes based on their functional categories—such as cell division, amino acid metabolism, carbon fixation, etc.—to better understand the metabolic capabilities of these microbes.
However, Prokka provides primarily gene names and product descriptions but lacks detailed functional classifications. Additionally, many annotations include hundreds of hypothetical proteins, which makes analysis challenging.
I have performed metabolic reconstructions using METABOLIC, but I notice that in many articles involving MAGs, researchers use several tools for similar purposes. Therefore, I would appreciate recommendations on the best tools or workflows for metabolic reconstruction from MAGs, as well as strategies to interpret and analyze Prokka gene predictions effectively.
Thank you very much for any suggestions!
Best regards, Alla
Prokka is just an annotation pipeline and relies on/amalgamates predictions generated by a number of other tools - it doesn't do any more than decent annotation.
You can use tools like
CD-HITand evenroaryto cluster genes together into othologue groups, on which you can then do various gene ontology-type analyses.I'm personally a bit out of touch with the metagenome space to know what the best tools are these days but what you describe sounds like a clustering and annotation refinement kind of task.
If you're only interested in metabolic pathways, then you're probably already on the right track looking at what is commonly used in the literature at the moment.