Question: Annotation with Prokka or RAST.
2
gravatar for lokraj2003
16 months ago by
lokraj200380
lokraj200380 wrote:

0 down vote favorite I was experimenting Prokka and RAST annotation tools. So, I took a well-annotated swinepox virus genome from genebank (NCBI Reference Sequence: NC_003389.1).

I ran those sequences on Prokka and RAST Seed server at the same time. I can see that only a few (may be around 1%) of the genes were annotated. Most of them were predicted as hypothetical protein. And the results were comparable between Prokka and RAST.

I would assume that these tools look for similar sequences in NCBI and find the best-match protein. But looks like that is not the case. It should be able to find that well annotated swinepox virus genome in genebank and predict most of the proteins.

Also, if almost all the genes are predicted as hypothetical protein then there is not much difference between gene prediction tool like Genemark and genome annotation tool. Are there any better annotation tools ? Or this is what we get ? Or have I misunderstood the concept of annotation ? Please someone help me to understand this thing.

I have attached the image for comparison. Left one the Swinepox genome in .gb format and right one is the same genome annotated with Prokka.

Image link : https://github.com/lrjoshi/sample/blob/master/docs/annotation.PNG

Image

prokka annotation • 2.5k views
ADD COMMENTlink modified 16 months ago • written 16 months ago by lokraj200380
1

For reference: How to add images to a Biostars post

Which databases did you use with Prokka?

Prokka uses a variety of databases when trying to assign function to the predicted CDS features. It takes a hierarchial approach to make it fast.

The initial core databases are derived from UniProtKB; there is one per "kingdom" supported. To qualify for inclusion, a protein must be (1) from Bacteria (or Archaea or Viruses); (2) not be "Fragment" entries; and (3) have an evidence level ("PE") of 2 or lower, which corresponds to experimental mRNA or proteomics evidence.
ADD REPLYlink modified 16 months ago • written 16 months ago by genomax75k

I used Prokka on Galaxy server. I selected viruses under the kingdom. Remaining parameters were set as default. So, I am not sure about the database it used.

ADD REPLYlink written 16 months ago by lokraj200380

I was able to get good annotation using prokka by using following command.

prokka --proteins reference.gb --outdir annotation --prefix myprotein contigs.fa

All the reading frames were annotated. Actually the ones that are hypothetical in reference genomes are also annotated.

Thanks @h.mon.

ADD REPLYlink written 16 months ago by lokraj200380
2
gravatar for h.mon
16 months ago by
h.mon28k
Brazil
h.mon28k wrote:

Also, if almost all the genes are predicted as hypothetical protein then there is not much difference between gene prediction tool like Genemark and genome annotation tool.

Genome annotation is a two-step process, first you have to predict where the genes are - which tools like Augustus and GeneMark do - then you have to assign function to the predicted genes - usually by means of similarity searches using good quality databases. This is what Prokka and RAST are doing, but integrated in a pipeline.

I don't know how RAST works, but prokka uses several programs to predict genes (protein coding, non-coding RNA, tRNA, rRNA, and more) from the genome, then, after having these predictions, it tries to annotated them by searching available databases. Prokka uses Prodigal for protein coding gene prediction, which I don't know if it appropriate for virus gene prediction. But the most important reason your annotation came up mostly as hypothetical proteins probably is you don't have installed an appropriate database to annotate viral genomes - you can pass one at run time with the --proteins option, which will have precedence over other installed databases.

ADD COMMENTlink modified 16 months ago • written 16 months ago by h.mon28k

I used prokka on Galaxy server. I chose "viruses" under kingdom and all the other parameters were kept default. I will try to run on Linux using custom database. Thanks !

ADD REPLYlink written 16 months ago by lokraj200380
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1138 users visited in the last hour