Tool: Goodbye, Genbank: A Python package that salvages feature annotations from GenBank records
While building a parts library for internal use, I noticed the quirks of the GenBank format and also the fact that almost no GenBank file is up to spec. I started building a tool to iron out the quirks and salvage only the usable parts of GenBank feature annotations for use elsewhere. It has become a larger task than I initially anticipated and I thought some other people might find it useful or wish to contribute to it, so I made it open source:

In summary, this is:

  • A Python package for use with Biopython
  • It maps GenBank feature keys (and in some cases qualifiers) to Sequence Ontology terms.
  • It fixes/normalizes GenBank feature qualifiers (annotations) and discards qualifiers that cannot be fixed. This is customizable to allow for adding your own salvaging code for certain qualifiers.
  • The output is nice, predictable features that can be used elsewhere.
  • Masochists can also use this package to simply clean up GenBank feature annotations into valid GenBank.

(A GFF3 exporter is also planned, but it may be a while.)

