Question: Merging gff/gbk to create a 'full' gbk
gravatar for Joe
3.7 years ago by
United Kingdom
Joe14k wrote:


Apologies if there is an existing answer to this, but I'm not exactly sure what I'm looking for...


I have a number of genomes that are annotated by the bacterial annotation program prokka. The output of the program is a number of sequence files:

-rw-r--r-- 1 wms_joe wms_joe 834K Jan  4 14:21 PLJXUR_01042016.err
-rw-r--r-- 1 wms_joe wms_joe 1.6M Jan  4 14:21 PLJXUR_01042016.faa
-rw-r--r-- 1 wms_joe wms_joe 4.3M Jan  4 14:21 PLJXUR_01042016.ffn
-rw-r--r-- 1 wms_joe wms_joe 5.3M Jan  4 14:06 PLJXUR_01042016.fna
-rw-r--r-- 1 wms_joe wms_joe 5.3M Jan  4 14:21 PLJXUR_01042016.fsa
-rw-r--r-- 1 wms_joe wms_joe  12M Jan  4 14:21 PLJXUR_01042016.gbk
-rw-r--r-- 1 wms_joe wms_joe 6.9M Jan  4 14:21 PLJXUR_01042016.gff
-rw-r--r-- 1 wms_joe wms_joe  60K Jan  4 14:21 PLJXUR_01042016.log
-rw-r--r-- 1 wms_joe wms_joe  18M Jan  4 14:21 PLJXUR_01042016.sqn
-rw-r--r-- 1 wms_joe wms_joe 1.2M Jan  4 14:21 PLJXUR_01042016.tbl
-rw-r--r-- 1 wms_joe wms_joe  151 Jan  4 14:21 PLJXUR_01042016.txt

As you might be able to tell from the extensions, there are fasta feature files, plaintext files, tabular files and a gff and gbk (among others).

For some reason (a quirk of the program I guess, or maybe an option I'm missing), the gbk file contains no annotations, so when browsing in Artemis Genome Browser (from the Sanger Ins.) the sequence is available, but no genes. Consequently, I use the gff for examining the genomes. I'm not actually sure why this works, as the gff is supposed to only be a feature file and contain no sequence as far as I'm aware.

So my actual question is:

Can I somehow combine annotation information in to the genbank, to create a genbank (full) as you might get from NCBI, or, combine sequence information in to the gff?


I just want one file that can be browsed that has both the sequence and annotation. Does anyone know of any scripts or programs that already take care of that?


sequence annotation • 1.9k views
ADD COMMENTlink written 3.7 years ago by Joe14k

GFF3 files can contain sequence information in a sequence section, see

That would explain why it works in the genome browser, alternatively this works because you have loaded the sequence first and add the annotation file in the same session. 

If you focus on making a Genbank file from GFF the answer can already be found here: Converting Gff/Gtf + Reference To Embl Or Genbank ...Any Tools Available?

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Michael Dondrup46k

Ah excellent, They do indeed seem to include the sequence. I must have had errors claiming no sequence from trying to use them with programs that do not yet support GFF.

I know I wasn't adding the annotations after loading the sequence as it would work with just the gffs alone. Thanks for the links.

ADD REPLYlink written 3.7 years ago by Joe14k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1777 users visited in the last hour