Question: Genbank format guidelines
0
gravatar for jamietmorton
5 months ago by
United States
jamietmorton0 wrote:

Not sure if this is the right forum for this.

Basically, I'm having some issues trying to find formatting guidelines for genbank. I'm trying to build a parser for genbank files, and looking at the lambda virus genome here there is a loc string complement(<23231).

What exactly does this mean? Does it refer to the complement between coordinate 1 and 23231? Also, is the reverse possible? i.e. complement(>23231).

genbank • 196 views
ADD COMMENTlink modified 4 months ago by lieven.sterck6.7k • written 5 months ago by jamietmorton0

NCBI Genbak uses flat file format. Details can be found in the following link. Also, check the biopython GenBank module.

https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html

ADD REPLYlink written 5 months ago by arup1.9k
3
gravatar for lieven.sterck
4 months ago by
lieven.sterck6.7k
VIB, Ghent, Belgium
lieven.sterck6.7k wrote:

you can find good and ample info on the spec of the genbank (embl/ddbj) formats here

On your specific question: the '<'' (or '>') signs in the coordinate structure points to 'true coordinate is beyond/ before this position . Eg. if a gene is incompletely annotated (or because the genomic sequence is missing) you can denote it as <100 , meaning that you know the gene is truncated and that the correct start is located somewhere upstream of position 100 on that sequence

ADD COMMENTlink written 4 months ago by lieven.sterck6.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 684 users visited in the last hour