Genbank File Format Question
11.8 years ago
Lee Katz

Hi, I am making a script to concatenate contigs in genbank format. There are many annotations in the file. Between each contig is a linker sequence that is bounded by Ns and internally has starts and stops in every coding frame. The objective is to be able to view a single contig in a genome browser such as Apollo.

My question is, how would I correctly annotate the artificial linker sequence in GenBank? I found several fields that are allowed in GenBank format as feature keys, but none seem to qualify as "artificial linker sequence." I found the "unsure" feature key which is as good as any, but is there one that Apollo will recognize and that is allowed in GenBank format?

For reference:

11.7 years ago
Torst

I tend to use "unsure" for stuff like that. But you could use "assembly-gap" which not strictly true to the INSDC definition, is in the spirit of what it is! I think "misc-feature" is also valid here. I've even seen "-" as a feature type in some Genbank files.

If you are happy to use GFF3 + SOFA feature types, then "assembly-component" is appropriate.


