Prokka output genes clusterization by function
1
0
Entering edit mode
11 weeks ago
shevch2009 ▴ 20

Hello everyone,

I have MAGs and their annotations generated using Prokka. I am interested in clustering genes based on their functional categories—such as cell division, amino acid metabolism, carbon fixation, etc.—to better understand the metabolic capabilities of these microbes.

However, Prokka provides primarily gene names and product descriptions but lacks detailed functional classifications. Additionally, many annotations include hundreds of hypothetical proteins, which makes analysis challenging.

I have performed metabolic reconstructions using METABOLIC, but I notice that in many articles involving MAGs, researchers use several tools for similar purposes. Therefore, I would appreciate recommendations on the best tools or workflows for metabolic reconstruction from MAGs, as well as strategies to interpret and analyze Prokka gene predictions effectively.

Thank you very much for any suggestions!

Best regards, Alla

data prokka shotgun • 5.8k views
ADD COMMENT
1
Entering edit mode

Prokka is just an annotation pipeline and relies on/amalgamates predictions generated by a number of other tools - it doesn't do any more than decent annotation.

You can use tools like CD-HIT and even roary to cluster genes together into othologue groups, on which you can then do various gene ontology-type analyses.

I'm personally a bit out of touch with the metagenome space to know what the best tools are these days but what you describe sounds like a clustering and annotation refinement kind of task.

If you're only interested in metabolic pathways, then you're probably already on the right track looking at what is commonly used in the literature at the moment.

ADD REPLY
0
Entering edit mode

I am still confused how to put all predicted genes into metabolic pathways, I used eggnog-mapper for the prokka output, and can't find a tool which just put those KO into KEGG pathways, I don't want to use KEGG-mapper because It's web tool, I need something for the cluster and that will give me a simple table with all full pathways in my MAGs...

In some articles people just create thier own piplines and custom things or say they used KEGG db. I have found - MinPath, but It's a minimal ammount of pathways, Also, KEGGDecoder, but it needs kofamscan outputs as input, Just looking for something like METABOLIC, but for prokka-eggnog output files!

I would appreciate any help. Thanks

ADD REPLY
1
Entering edit mode
18 hours ago
Mensur Dlakic ★ 30k

It is easy to install kofam_scan and get the output:

https://github.com/takaram/kofam_scan

What is the impediment to using METABOLIC? It can start from either (meta)genomes or (meta)proteomes, so just use the latter option if you want to maintain the existing prokka annotations. Same for DRAM, also takes either DNA or protein files:

https://github.com/WrightonLabCSU/DRAM

ADD COMMENT
0
Entering edit mode

Thanks, I had completely forgotten that METABOLIC works with protein sequences. That will work :)

I was just trying to figure out how people handle Prokka annotations.

I also tried kofam_scan — it gave me multiple definitions for one sequence. It was one enzyme but appearing in a couple of systems, so here I think I need to choose the best e-value or score. However, eggnog-mapper gave me just one name of the system. Therefore, I was looking for a tool to work with this data further.

ADD REPLY
1
Entering edit mode

Many proteins have multiple domains, and are correctly classified with multiple KO numbers. That's not a bug in kofam_scan but rather a reality of protein functions. If you look through individual KO annotations, many of them will be found in multiple pathways.

ADD REPLY

Login before adding your answer.

Traffic: 3848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6