Question: Bio.Genbank.Locationparsererror
2
gravatar for User 6115
8.3 years ago by
User 611520
User 611520 wrote:

Hi all,

I'm scanning through all of GenBank's bacterial genomes using biopython.

I've been getting an occasional error recently parsing location data. Specifically:


  File "/usr/lib/pymodules/python2.7/Bio/SeqIO/init.py", line 525, in parse
    for r in i:
  File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line 437, in parserecords
    record = self.parse(handle, dofeatures)
  File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line 420, in parse
    if self.feed(handle, consumer, dofeatures):
  File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line 392, in feed
    self.feedfeaturetable(consumer, self.parsefeatures(skip=False))
  File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line 344, in _feedfeaturetable
    consumer.location(locationstring)
  File "/usr/lib/pymodules/python2.7/Bio/GenBank/init.py", line 975, in location
    raise LocationParserError(location_line)
  Bio.GenBank.LocationParserError: order(join(649703..649712,649751..649752),650047..650049)

My code is a simple loop through all filenames I feed in at the command line:


        [...]

        try:
            contig = SeqIO.parse(open(gb_file,"r"), "genbank")
        except:
            sys.stderr.write("ERROR: Parsing gbk file "+gb_file+"!\n")
            sys.exit(1)
        sys.stderr.write("Loading genome " + str(counter) + " of "+str(len(sys.argv)-1)+" ("+gb_file+")\n")

        for gb_record in contig:

           [...]

This is in the Aeropyrum pernix K1 genome, NC_000854.gbk. I don't see anything wrong with the location data. Can anyone help?

Thanks, -Morgan

biopython • 1.8k views
ADD COMMENTlink written 8.3 years ago by User 611520
3
gravatar for Brad Chapman
8.3 years ago by
Brad Chapman9.4k
Boston, MA
Brad Chapman9.4k wrote:

This looks like a case of the issue discussed here:

https://redmine.open-bio.org/issues/3197

Where order and join are combined in a single location:

order(join(649703..649712,649751..649752),650047..650049)

According to the GenBank specification this should not be allowed.

Peter posted a fix in that discussion, but decided not to check it in as the files in question were identified as problematic by NCBI. You can try Peter's fix if you just need to get through these. As a more permanent solution, you should e-mail NCBI and get clarification on if this is allowed or will be fixed. If these are reported as valid then please do reopen that bug discussion and lobby for including a more permanent change.

ADD COMMENTlink written 8.3 years ago by Brad Chapman9.4k

I noted this on the Biopython bug report, and emailed the NCBI.

ADD REPLYlink written 8.3 years ago by Peter5.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1913 users visited in the last hour