Hi, Might be this one is an old question but I haven't found a real answer. Does anyone know an annotation pipeline (automatic or not) for working with bacterial species? In my case, there is not reference genome close to my species.
We (at Oh no sequences!) have developed an annotation system specially designed for bacterial and NGS data. It's called BG7, probably the most interesting feature to you is that a close reference genome is not needed.
Unlike other annotation pipelines, like those based on ORF prediction with Glimmer, where your annotation strongly depends on having a close reference genome BG7 system works very well even when you don't have a reference genome. You just need a set of what we call 'reference proteins' that will guide the annotation, these proteins don't need to be too similar to the proteins you expect to find in your genome, so it's no problem if you don't have a close reference. We've tested it in lots of genomes (some of them with no similar sequences) and are very happy with the results.
The system is open-source (AGPL-V3 license) so you can freely use it.
We're about to launch its website, meanwhile you can take a look at these slides describing it and the results files of the E. coli Germany outbreak we published in this Github repository (the system gives the annotations in more format like gbk and embl, this is just an example of the annotations)
Please let me know if you want to know anything else, @pablopareja is the main developer, you can also ask him
EDIT: We've just launched the bg7 website http://bg7.ohnosequences.com/ please feel free to try it (any feedback is highly appreciated) :)
RAST works really well.
RAST (Rapid Annotation using Subsystem Technology) is a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree.
The GMOD project has several alternatives, of which MAKER (mentioned above) is one, though it leans a little towards the euks. Another option which was designed for work with prokaryotes is DIYA (though looking at that page now it looks like SourceForge is messing with our wiki page). There is also Ergatis which was designed by the people at TIGR/JCVI for doing bacterial annotation, which they know how to do very well (they are now at the University of Maryland). Ergatis is by far the most powerful, but overkill to install if you are only doing one genome. If you are only doing one genome, you might want to look at CloVR, which I am pretty sure is powered by Ergatis but is inside a virtual machine that you can download and run (I think they have options for running it on the cloud too, but I haven't talked to them in a while).
This thread seems to have died despite this not being a solved probelm. One could also check PRODIGAL. It does a very fast annotation of proteins, like 10 seconds. It is a single binary to download and running is fast since bacterial genomes are small. If it doesn't work, then not much time is lost.