How to go from a fasta file to an annotated genome assembly in genbank format?
12 months ago
vixelaa ▴ 20

Does anyone know what the best/fastest way is to create an annoted bacterial genome assembly (genbank format) starting from a fasta file containing the whole genome sequence? (There is a gff3 file with the gene annotations available.)Thank you.

Can we assume this is a new bacterial species (as in : not previously has been annotated?)

if so, you then will have to go through a step called genome annotation, which might not always be an easy process but luckily for most bacterial species it is not that hard. I can suggest PROKKA to do this, though there are many other tools around (have you looked around/googled for any?)

Sorry, my question was probably not entirely clear. The sequence I would like to annotate is an ancestral genome from a bacterial species that are already annotated. In fact I did some mapping of new isolates against this ancestor so it's also already assembled. Now I'm wondering if there is some kind of fast way to take the annotation from the current known annotated bacterial genome and "paste" it onto the ancestral one...

12 months ago
Joe 19k

You could use something like RATT but I think this requires that the genomes be very very similar, which it sounds like yours might not be. In which case, you can still use prokka, and provide a database of 'trusted' proteins from which to start the annotation.

indeed, in this case I would also recommend something like RATT. A recent alternative, and worth a try I think, for it is this one: liftoff (https://www.biorxiv.org/content/10.1101/2020.06.24.169680v1)

Thank you! I'll try both.

Just fyi, I used liftoff and it worked almost perfectly, I got a gff file (only missed around 10 genes which can be added manually). Then this together with the fasta file I converted to genbank format.

thanks for the feedback (appreciated) and good to hear liftoff is promising.