0 down vote favorite I was experimenting Prokka and RAST annotation tools. So, I took a well-annotated swinepox virus genome from genebank (NCBI Reference Sequence: NC_003389.1).
I ran those sequences on Prokka and RAST Seed server at the same time. I can see that only a few (may be around 1%) of the genes were annotated. Most of them were predicted as hypothetical protein. And the results were comparable between Prokka and RAST.
I would assume that these tools look for similar sequences in NCBI and find the best-match protein. But looks like that is not the case. It should be able to find that well annotated swinepox virus genome in genebank and predict most of the proteins.
Also, if almost all the genes are predicted as hypothetical protein then there is not much difference between gene prediction tool like Genemark and genome annotation tool. Are there any better annotation tools ? Or this is what we get ? Or have I misunderstood the concept of annotation ? Please someone help me to understand this thing.
I have attached the image for comparison. Left one the Swinepox genome in .gb format and right one is the same genome annotated with Prokka.