Question: Prokka Annotation or NCBI Annotation
gravatar for Optimist
3 months ago by
Optimist160 wrote:

Dear All,

I have two sets of annotation files for 15 Bacterial genomes. One set from NCBI annotations (from RefSeq) and the other from Prokka (I have run it in my local machine).

Which one is advisable to use between the two for all downstream analysis?

Awaiting your valuable feedback

Thank You

genbank prokka ncbi • 353 views
ADD COMMENTlink modified 3 months ago by Mensur Dlakic9.1k • written 3 months ago by Optimist160
gravatar for Mensur Dlakic
3 months ago by
Mensur Dlakic9.1k
Mensur Dlakic9.1k wrote:

If I remember correctly, prokka comes only with HAMAP database of HMMs, which will produce terrible annotations on prokaryotic genomes. To get good annotations you would need to install at least Pfam and TIGRfams. Don't know if you have done that or not, but you can find out by looking at prokka's annotations. If there are many hypothetical proteins for prokka where NCBI files have meaningful annotations, chances are that you don't have any extra prokka HMM databases. If you are literally comparing identical genomes, it may be better to go with NCBI annotations.

ADD COMMENTlink written 3 months ago by Mensur Dlakic9.1k

I have just checked my Prokka and It has following databases

Looking for databases in: /home/bio2/miniconda3/envs/prokka_env/db

  • Kingdoms: Archaea Bacteria Mitochondria Viruses
  • Genera: Enterococcus Escherichia Staphylococcus
  • CMs: Archaea Bacteria Viruses

As you rightly pointed out, it doesn't have Pfam and TIGRfams.

Is there a way to add these databases to my prokka?

Currently, Since the bacterial genomes number is 15, I can download the GenBank file with annotations from NCBI. But in future If I want to add more genomes, then I will have to fall back on Prokka for large-scale annotations.

Thank You

ADD REPLYlink modified 3 months ago • written 3 months ago by Optimist160

It should be enough to download the databases and place them in prokka's hmm directory (for you that seems to be /home/bio2/miniconda3/envs/prokka_env/db/hmm).

After gunzipping, I suggest you rename the databases to specify the order in which they will be searched during annotation:

mv TIGRFAMs_15.0_HMM.LIB 1-TIGRFAMs_15.0.hmm
mv Pfam-A.hmm 2-Pfam-A.hmm
mv HAMAP.hmm 3-HAMAP.hmm

When all is done run:

prokka --setupdb
ADD REPLYlink modified 3 months ago • written 3 months ago by Mensur Dlakic9.1k

Should I modify the command in order for prokka to use the 3 databases? or it takes it automatically??

I have run this 'prokka --setupdb'

This is the command i generally use to run prokka

prokka GCA_000168335.1_ASM16833v1_genomic.fasta --outdir GCA_000168335.1_ASM16833v1_prokka_compliant_out_29-10-20 --prefix GCA_000168335.1_ASM16833v1_prokka --genus Pseudomonas --species aeruginosa --kingdom Bacteria --usegenus --compliant --cpus 64 --rfam

Kindly give your valuable suggestions & feedback

Thank You

ADD REPLYlink written 3 months ago by Optimist160

If newly added databases were listed after running prokka --setupdb, you should be able to run everything as intended. That particular command may not give you anything different on Pseudomonas whether you invoked --usegenus or not as I think that prokka has gene specific information only about some enterobacteria (prokka --listdb will give you database information).

ADD REPLYlink written 3 months ago by Mensur Dlakic9.1k

Is prokka not designed and optimised for prokaryotic genome annotation?

If annotating with Pfam, how would you do that?

ADD REPLYlink modified 12 weeks ago • written 3 months ago by robert.murphy30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1996 users visited in the last hour