Question

Hypothetical protein from Prokka and mapping them on KEGG

1

Entering edit mode

3.0 years ago

Jonathan Yoou ▴ 60

Hi all,

I'm analyzing WGS of strain using Prokka and I got .gff and .faa (Protein FASTA file of the translated CDS sequences) files from it. And I'm not sure whether what I'm doing is right...

So, many of "hypothetical protein" annotated from Prokka are "well.. I know this guy is a protein hypothetically, but I don't know what it is exactly following my database", right? Then if I map the proteins amino acid sequences in KEGG using BlastKoala, the reason why I can get specifically annotated pathways and proteins is because those hypothetical proteins do actually have identified functions and names in database KEGG is using???

I'd like to answer the question, "if you map with hypothetical proteins, how do you know they are engaged in different KEGG pathways and all they are actually annotated?"

Thank you in advance :)

kegg blastkoala hypothetical protein prokka • 1.1k views

ADD COMMENT • link updated 3.0 years ago by Mensur Dlakic ★ 27k • written 3.0 years ago by Jonathan Yoou ▴ 60

score 0 · Answer 1 · 2021-05-14

If you don't have properly installed HMM databases for prokka, most if not all of your protein will come out as hypothetical. It is normal for 30-40% of them to have that designation, but a majority should be annotated after a prokka run. The program's github page explains how to install the databases, and you will have to do that manually as I think only a HAMAP database comes standard with prokka.

It is unlikely, though possible, that KEGG will annotate many proteins that prokka can't. It will happen here and there, but see my explanation above if many of your proteins lack prokka annotations but KEGG can assign them some function. KEGG annotations in general are reliable, so you can usually trust them even if prokka designated the proteins as hypothetical.