Hypothetical protein from Prokka and mapping them on KEGG
1
1
Entering edit mode
3.0 years ago

Hi all,

I'm analyzing WGS of strain using Prokka and I got .gff and .faa (Protein FASTA file of the translated CDS sequences) files from it. And I'm not sure whether what I'm doing is right...

So, many of "hypothetical protein" annotated from Prokka are "well.. I know this guy is a protein hypothetically, but I don't know what it is exactly following my database", right? Then if I map the proteins amino acid sequences in KEGG using BlastKoala, the reason why I can get specifically annotated pathways and proteins is because those hypothetical proteins do actually have identified functions and names in database KEGG is using???

I'd like to answer the question, "if you map with hypothetical proteins, how do you know they are engaged in different KEGG pathways and all they are actually annotated?"

Thank you in advance :)

kegg blastkoala hypothetical protein prokka • 1.1k views
ADD COMMENT
0
Entering edit mode
3.0 years ago
Mensur Dlakic ★ 27k

If you don't have properly installed HMM databases for prokka, most if not all of your protein will come out as hypothetical. It is normal for 30-40% of them to have that designation, but a majority should be annotated after a prokka run. The program's github page explains how to install the databases, and you will have to do that manually as I think only a HAMAP database comes standard with prokka.

It is unlikely, though possible, that KEGG will annotate many proteins that prokka can't. It will happen here and there, but see my explanation above if many of your proteins lack prokka annotations but KEGG can assign them some function. KEGG annotations in general are reliable, so you can usually trust them even if prokka designated the proteins as hypothetical.

ADD COMMENT

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6