I've googled quite a lot to solve this question, but I couldn't find a clear answer.
I've run several tools for gene prediction and annotation like Prokka, Prodigal, Anvi'o, and eggNOG mapper.
I found out that all of them import Prokka for gene prediction, but all result in different outcome even though they use same algorithm. Can you explain how this is happened and what should I choose for further analysis? I used to use Prokka instead of other tools because they use Prokka anyway.
Prokka doesn't predict genes - it uses prodigal to do it. Prokka is a nice piece of software, but in my opinion it has a major flaw. It comes only with a HAMAP database preinstalled, and it doesn't state clearly that its predictive capabilities will be severely limited without installing other HMM database. At the very least, I think Pfam and TIGRfams databases should be installed for proper functionality. I also add COG and KOG HMM databases, but those may not be widely available. I suggest you scroll down on prokka github page for information how to install other databases.
It is unlikely that one will ever get identical annotations in all respects from different tools, because they use different underlying methodologies, different databases, and most likely their threshold cutoffs are different as well. There will be an agreement for most proteins if the analyses are done properly, which for prokka includes installing additional databases.
You may want to search Biostars with prokka keyword for older posts on this subject. There are some links on the right-hand side of this page.
I don't have a clear answer, but I know Prokka and Prodigal are not equivalent; in fact Prokka is sort of a wrapper program that starts by using Prodigal to predict coding sequences, and then iteratively searches these sequences through different databases using a hierarchy of approaches. It starts by blasting a user-provided db (if provided). All non-hits are then blasted through a UniProt db, and finally searches HMM libraries using HMMER.
In comparison, eggNOg mapper uses orthology assignment to predict function. I'm not sure I can explain correctly why this is a different approach, but from eggNOG :
The use of orthology predictions for functional annotation permits a
higher precision than traditional homology searches (i.e. BLAST
searches), as it avoids transferring annotations from close paralogs
(duplicate genes with a higher chance of being involved in functional
I am not familiar with Anvi'o approach, but looking into how it profiles functions might reveal cues on why it diverges as well.
Hopefully my answer helps at least a little!
EDIT : Mensur Dlakic you answered while I was typing and went on lunch break. Your understanding of Prokka is better than mine! Cheers
Hi, I have been having the same problem when I want to perform the annotation in bacteria. eggnogmapper results are a little different from prokka results, particularly in the number of annotated genes. This has been a huge problem when I want to assign COG categories or Go categories. In the beginning, I was trying to create one file between prokka, eggnog mapper, and prodigal, but I realized that it's double work.
What I do is that I take the output of prokka proteins.faa and I submit the file to kegg-koala or kegg kass for the functional annotation. I realized that in eggnogmapper, some genes were assigned incorrectly to another cog category. Anvi'o is also really good, but it is better to only choose one entity to follow a workflow and keep the same gene codes.
There are many options, however I'm not aware of the modes that each tool predict the gene. Best regards