Genbank File Format Question
Entering edit mode
11.9 years ago
Lee Katz ★ 3.1k

Hi, I am making a script to concatenate contigs in genbank format. There are many annotations in the file. Between each contig is a linker sequence that is bounded by Ns and internally has starts and stops in every coding frame. The objective is to be able to view a single contig in a genome browser such as Apollo.

My question is, how would I correctly annotate the artificial linker sequence in GenBank? I found several fields that are allowed in GenBank format as feature keys, but none seem to qualify as "artificial linker sequence." I found the "unsure" feature key which is as good as any, but is there one that Apollo will recognize and that is allowed in GenBank format?

For reference:

genbank format • 3.2k views
Entering edit mode
11.7 years ago
Torst ▴ 980

I tend to use "unsure" for stuff like that. But you could use "assembly-gap" which not strictly true to the INSDC definition, is in the spirit of what it is! I think "misc-feature" is also valid here. I've even seen "-" as a feature type in some Genbank files.

If you are happy to use GFF3 + SOFA feature types, then "assembly-component" is appropriate.


Login before adding your answer.

Traffic: 1516 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6