Question

Prokka output genes clusterization by function

0

Entering edit mode

10 weeks ago

shevch2009 ▴ 20

Hello everyone,

I have MAGs and their annotations generated using Prokka. I am interested in clustering genes based on their functional categories—such as cell division, amino acid metabolism, carbon fixation, etc.—to better understand the metabolic capabilities of these microbes.

However, Prokka provides primarily gene names and product descriptions but lacks detailed functional classifications. Additionally, many annotations include hundreds of hypothetical proteins, which makes analysis challenging.

I have performed metabolic reconstructions using METABOLIC, but I notice that in many articles involving MAGs, researchers use several tools for similar purposes. Therefore, I would appreciate recommendations on the best tools or workflows for metabolic reconstruction from MAGs, as well as strategies to interpret and analyze Prokka gene predictions effectively.

Thank you very much for any suggestions!

Best regards, Alla

data prokka shotgun • 5.6k views

ADD COMMENT • link updated 8 days ago by Kevin Blighe ★ 90k • written 10 weeks ago by shevch2009 ▴ 20

1

Entering edit mode

Prokka is just an annotation pipeline and relies on/amalgamates predictions generated by a number of other tools - it doesn't do any more than decent annotation.

You can use tools like CD-HIT and even roary to cluster genes together into othologue groups, on which you can then do various gene ontology-type analyses.

I'm personally a bit out of touch with the metagenome space to know what the best tools are these days but what you describe sounds like a clustering and annotation refinement kind of task.

If you're only interested in metabolic pathways, then you're probably already on the right track looking at what is commonly used in the literature at the moment.

ADD REPLY • link 10 weeks ago by Joe 22k