Prokka Annotation or NCBI Annotation
22 months ago
Optimist ▴ 180

Dear All,

I have two sets of annotation files for 15 Bacterial genomes. One set from NCBI annotations (from RefSeq) and the other from Prokka (I have run it in my local machine).

Which one is advisable to use between the two for all downstream analysis?

Thank You

22 months ago
Mensur Dlakic ★ 20k

If I remember correctly, prokka comes only with HAMAP database of HMMs, which will produce terrible annotations on prokaryotic genomes. To get good annotations you would need to install at least Pfam and TIGRfams. Don't know if you have done that or not, but you can find out by looking at prokka's annotations. If there are many hypothetical proteins for prokka where NCBI files have meaningful annotations, chances are that you don't have any extra prokka HMM databases. If you are literally comparing identical genomes, it may be better to go with NCBI annotations.

I have just checked my Prokka and It has following databases

Looking for databases in: /home/bio2/miniconda3/envs/prokka_env/db

• Kingdoms: Archaea Bacteria Mitochondria Viruses
• Genera: Enterococcus Escherichia Staphylococcus
• HMMs: HAMAP
• CMs: Archaea Bacteria Viruses

As you rightly pointed out, it doesn't have Pfam and TIGRfams.

Is there a way to add these databases to my prokka?

Currently, Since the bacterial genomes number is 15, I can download the GenBank file with annotations from NCBI. But in future If I want to add more genomes, then I will have to fall back on Prokka for large-scale annotations.

Thank You

It should be enough to download the databases and place them in prokka's hmm directory (for you that seems to be /home/bio2/miniconda3/envs/prokka_env/db/hmm).

After gunzipping, I suggest you rename the databases to specify the order in which they will be searched during annotation:

mv TIGRFAMs_15.0_HMM.LIB 1-TIGRFAMs_15.0.hmm
mv Pfam-A.hmm 2-Pfam-A.hmm
mv HAMAP.hmm 3-HAMAP.hmm


When all is done run:

prokka --setupdb

Should I modify the command in order for prokka to use the 3 databases? or it takes it automatically??

I have run this 'prokka --setupdb'

This is the command i generally use to run prokka

prokka GCA_000168335.1_ASM16833v1_genomic.fasta --outdir GCA_000168335.1_ASM16833v1_prokka_compliant_out_29-10-20 --prefix GCA_000168335.1_ASM16833v1_prokka --genus Pseudomonas --species aeruginosa --kingdom Bacteria --usegenus --compliant --cpus 64 --rfam

Kindly give your valuable suggestions & feedback

Thank You

If newly added databases were listed after running prokka --setupdb, you should be able to run everything as intended. That particular command may not give you anything different on Pseudomonas whether you invoked --usegenus or not as I think that prokka has gene specific information only about some enterobacteria (prokka --listdb will give you database information).

Is prokka not designed and optimised for prokaryotic genome annotation?

If annotating with Pfam, how would you do that?