Genbank format guidelines
1
0
Entering edit mode
4.6 years ago
jamietmorton ▴ 10

Not sure if this is the right forum for this.

Basically, I'm having some issues trying to find formatting guidelines for genbank. I'm trying to build a parser for genbank files, and looking at the lambda virus genome here there is a loc string complement(<23231).

What exactly does this mean? Does it refer to the complement between coordinate 1 and 23231? Also, is the reverse possible? i.e. complement(>23231).

genbank • 1.2k views
ADD COMMENT
0
Entering edit mode

NCBI Genbak uses flat file format. Details can be found in the following link. Also, check the biopython GenBank module.

https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html

ADD REPLY
3
Entering edit mode
4.6 years ago

you can find good and ample info on the spec of the genbank (embl/ddbj) formats here

On your specific question: the '<'' (or '>') signs in the coordinate structure points to 'true coordinate is beyond/ before this position . Eg. if a gene is incompletely annotated (or because the genomic sequence is missing) you can denote it as <100 , meaning that you know the gene is truncated and that the correct start is located somewhere upstream of position 100 on that sequence

ADD COMMENT

Login before adding your answer.

Traffic: 1542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6