BioPython error parsing standard GenBank file
2
1
Entering edit mode
8.1 years ago
morgan.beeby ▴ 10

Dear all,

I've recently resurrected a bioinformatics pipeline I put together a few years ago to load bacterial genome GenBank files into a MySQL database.

Having downloaded *.gbk from the genomes/Bacteria directory on ftp.ncbi.nih.gov, however, I get an error with many of the files:

ValueError: Expected CONTIG continuation line, got:
ORIGIN

It seems that there's an additional CONTIG field immediately before the nucleotide sequence (for example, this occurs with NC_019435.gbk).

Can anyone shed any light on the reason for this error now arising? I don't have any 'old versions' of these files around, but assume that the file format has been modified by GenBank?

many thanks,
Morgan

biopython • 2.5k views
ADD COMMENT
0
Entering edit mode

Standard Genbank file is an oxymoron, there might be an standard but people do what they please with it, it part of its definition.

ADD REPLY
0
Entering edit mode

Yeah, but it is an NCBI format so what they do is usually the standard ;)

ADD REPLY
2
Entering edit mode
8.1 years ago
Peter 6.0k

The GenBank format evolves over time, and newer versions of Biopython cope with this change fine. You appear to have an older copy of Biopython installed.

ADD COMMENT
1
Entering edit mode
8.1 years ago

I just downloaded NC_019435.gbk.gz from here

and loaded it up in biopython

>>> from Bio import SeqIO
>>> for r in SeqIO.parse("NC_019435.gbk","genbank"):
...     print r.id
...
NC_019435.1

It doesn't seem to have a problem. Perhaps you can share your genbank file and I can have a look?

ADD COMMENT

Login before adding your answer.

Traffic: 1434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6