Hi,
Apologies if there is an existing answer to this, but I'm not exactly sure what I'm looking for...
I have a number of genomes that are annotated by the bacterial annotation program prokka. The output of the program is a number of sequence files:
-rw-r--r-- 1 wms_joe wms_joe 834K Jan 4 14:21 PLJXUR_01042016.err
-rw-r--r-- 1 wms_joe wms_joe 1.6M Jan 4 14:21 PLJXUR_01042016.faa
-rw-r--r-- 1 wms_joe wms_joe 4.3M Jan 4 14:21 PLJXUR_01042016.ffn
-rw-r--r-- 1 wms_joe wms_joe 5.3M Jan 4 14:06 PLJXUR_01042016.fna
-rw-r--r-- 1 wms_joe wms_joe 5.3M Jan 4 14:21 PLJXUR_01042016.fsa
-rw-r--r-- 1 wms_joe wms_joe 12M Jan 4 14:21 PLJXUR_01042016.gbk
-rw-r--r-- 1 wms_joe wms_joe 6.9M Jan 4 14:21 PLJXUR_01042016.gff
-rw-r--r-- 1 wms_joe wms_joe 60K Jan 4 14:21 PLJXUR_01042016.log
-rw-r--r-- 1 wms_joe wms_joe 18M Jan 4 14:21 PLJXUR_01042016.sqn
-rw-r--r-- 1 wms_joe wms_joe 1.2M Jan 4 14:21 PLJXUR_01042016.tbl
-rw-r--r-- 1 wms_joe wms_joe 151 Jan 4 14:21 PLJXUR_01042016.txt
As you might be able to tell from the extensions, there are fasta feature files, plaintext files, tabular files and a gff and gbk (among others).
For some reason (a quirk of the program I guess, or maybe an option I'm missing), the gbk file contains no annotations, so when browsing in Artemis Genome Browser (from the Sanger Ins.) the sequence is available, but no genes. Consequently, I use the gff for examining the genomes. I'm not actually sure why this works, as the gff is supposed to only be a feature file and contain no sequence as far as I'm aware.
So my actual question is:
Can I somehow combine annotation information in to the genbank, to create a genbank (full) as you might get from NCBI, or, combine sequence information in to the gff?
I just want one file that can be browsed that has both the sequence and annotation. Does anyone know of any scripts or programs that already take care of that?
GFF3 files can contain sequence information in a sequence section, see http://gmod.org/wiki/GFF3#GFF3_Sequence_Section
That would explain why it works in the genome browser, alternatively this works because you have loaded the sequence first and add the annotation file in the same session.
If you focus on making a Genbank file from GFF the answer can already be found here: Converting Gff/Gtf + Reference To Embl Or Genbank ...Any Tools Available?
Ah excellent, They do indeed seem to include the sequence. I must have had errors claiming no sequence from trying to use them with programs that do not yet support GFF.
I know I wasn't adding the annotations after loading the sequence as it would work with just the gffs alone. Thanks for the links.