Question

Gene annotation pipeline for bacteria

0

Entering edit mode

7.3 years ago

bird77 ▴ 80

I have a draft genome of a alpha-proteobacterium (genome size about 8Mbp) and I want to perform gene prediction and annotation.

Until now, I have used the RAST server for this task, but the amount of predicted genes with the annotation "hypothetical protein" is very high (about 60%).

What can you recommend for prokaryotic gene prediction and annotation workflows (I certainly can also combine different tools for gene prediction and annotation, but I do not know what the current standard is)?

Thank you so much for your assistance.

annotation genome • 2.4k views

ADD COMMENT • link updated 6.3 years ago by predeus ★ 1.9k • written 7.3 years ago by bird77 ▴ 80

0

Entering edit mode

6.3 years ago

predeus ★ 1.9k

Prokka is probably most used in the field now. Using --proteins with a specific database (e.g. species-specific) seems like a good way to annotate most of the ORFs. Just make sure you get the reference proteins in the right format - they need to look something like this:

>gene_id ~~~gene_name~~~putative protein function

There are few scripts to make your protein annotations look like this. the problem is discussed in one of the issues on Prokka github repository.

ADD COMMENT • link 6.3 years ago by predeus ★ 1.9k

score 5 · Accepted Answer · 2017-01-22

5

Entering edit mode

7.3 years ago

colindaven 6.4k

Have a look at Prokka by Torsten Seeman, or the recent Genix.

Other places also have web based annotation pipelines, the NCBI one probably still exists.

ADD COMMENT • link 7.3 years ago by colindaven 6.4k