Not sure if this is the right forum for this.

Basically, I'm having some issues trying to find formatting guidelines for genbank. I'm trying to build a parser for genbank files, and looking at the lambda virus genome here there is a loc string complement(<23231).

What exactly does this mean? Does it refer to the complement between coordinate 1 and 23231? Also, is the reverse possible? i.e. complement(>23231).

NCBI Genbak uses flat file format. Details can be found in the following link. Also, check the biopython GenBank module.

ADD REPLYlink written 5 months ago by arup1.9k
you can find good and ample info on the spec of the genbank (embl/ddbj) formats here

On your specific question: the '<'' (or '>') signs in the coordinate structure points to 'true coordinate is beyond/ before this position . Eg. if a gene is incompletely annotated (or because the genomic sequence is missing) you can denote it as <100 , meaning that you know the gene is truncated and that the correct start is located somewhere upstream of position 100 on that sequence

