Question: Genbank format guidelines
gravatar for jamietmorton
5 months ago by
United States
jamietmorton0 wrote:

Not sure if this is the right forum for this.

Basically, I'm having some issues trying to find formatting guidelines for genbank. I'm trying to build a parser for genbank files, and looking at the lambda virus genome here there is a loc string complement(<23231).

What exactly does this mean? Does it refer to the complement between coordinate 1 and 23231? Also, is the reverse possible? i.e. complement(>23231).

genbank • 196 views
ADD COMMENTlink modified 4 months ago by lieven.sterck6.7k • written 5 months ago by jamietmorton0

NCBI Genbak uses flat file format. Details can be found in the following link. Also, check the biopython GenBank module.

ADD REPLYlink written 5 months ago by arup1.9k
gravatar for lieven.sterck
4 months ago by
VIB, Ghent, Belgium
lieven.sterck6.7k wrote:

you can find good and ample info on the spec of the genbank (embl/ddbj) formats here

On your specific question: the '<'' (or '>') signs in the coordinate structure points to 'true coordinate is beyond/ before this position . Eg. if a gene is incompletely annotated (or because the genomic sequence is missing) you can denote it as <100 , meaning that you know the gene is truncated and that the correct start is located somewhere upstream of position 100 on that sequence

ADD COMMENTlink written 4 months ago by lieven.sterck6.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 684 users visited in the last hour