Annotation with Prokka or RAST.
1
3
Entering edit mode
5.8 years ago
lokraj2003 ▴ 120

0 down vote favorite I was experimenting Prokka and RAST annotation tools. So, I took a well-annotated swinepox virus genome from genebank (NCBI Reference Sequence: NC_003389.1).

I ran those sequences on Prokka and RAST Seed server at the same time. I can see that only a few (may be around 1%) of the genes were annotated. Most of them were predicted as hypothetical protein. And the results were comparable between Prokka and RAST.

I would assume that these tools look for similar sequences in NCBI and find the best-match protein. But looks like that is not the case. It should be able to find that well annotated swinepox virus genome in genebank and predict most of the proteins.

Also, if almost all the genes are predicted as hypothetical protein then there is not much difference between gene prediction tool like Genemark and genome annotation tool. Are there any better annotation tools ? Or this is what we get ? Or have I misunderstood the concept of annotation ? Please someone help me to understand this thing.

I have attached the image for comparison. Left one the Swinepox genome in .gb format and right one is the same genome annotated with Prokka.

Image link : https://github.com/lrjoshi/sample/blob/master/docs/annotation.PNG

Image

prokka annotation • 8.1k views
ADD COMMENT
1
Entering edit mode

For reference: How to add images to a Biostars post

Which databases did you use with Prokka?

Prokka uses a variety of databases when trying to assign function to the predicted CDS features. It takes a hierarchial approach to make it fast.

The initial core databases are derived from UniProtKB; there is one per "kingdom" supported. To qualify for inclusion, a protein must be (1) from Bacteria (or Archaea or Viruses); (2) not be "Fragment" entries; and (3) have an evidence level ("PE") of 2 or lower, which corresponds to experimental mRNA or proteomics evidence.
ADD REPLY
0
Entering edit mode

I used Prokka on Galaxy server. I selected viruses under the kingdom. Remaining parameters were set as default. So, I am not sure about the database it used.

ADD REPLY
0
Entering edit mode

I was able to get good annotation using prokka by using following command.

prokka --proteins reference.gb --outdir annotation --prefix myprotein contigs.fa

All the reading frames were annotated. Actually the ones that are hypothetical in reference genomes are also annotated.

Thanks @h.mon.

ADD REPLY
3
Entering edit mode
5.8 years ago
h.mon 35k

Also, if almost all the genes are predicted as hypothetical protein then there is not much difference between gene prediction tool like Genemark and genome annotation tool.

Genome annotation is a two-step process, first you have to predict where the genes are - which tools like Augustus and GeneMark do - then you have to assign function to the predicted genes - usually by means of similarity searches using good quality databases. This is what Prokka and RAST are doing, but integrated in a pipeline.

I don't know how RAST works, but prokka uses several programs to predict genes (protein coding, non-coding RNA, tRNA, rRNA, and more) from the genome, then, after having these predictions, it tries to annotated them by searching available databases. Prokka uses Prodigal for protein coding gene prediction, which I don't know if it appropriate for virus gene prediction. But the most important reason your annotation came up mostly as hypothetical proteins probably is you don't have installed an appropriate database to annotate viral genomes - you can pass one at run time with the --proteins option, which will have precedence over other installed databases.

ADD COMMENT
0
Entering edit mode

I used prokka on Galaxy server. I chose "viruses" under kingdom and all the other parameters were kept default. I will try to run on Linux using custom database. Thanks !

ADD REPLY

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6