Biopython: Entrez.efetch causes UnboundLocalError
1
0
Entering edit mode
9.0 years ago

Hello,

I work with Biopython and try to find pathways in which certain Proteins are involved. Therefore I use the following code:

from Bio import Entrez
handle = Entrez.efetch(id = "1134002", db = "biosystems", retmode = "xml")
data = Entrez.read(handle)
handle.close()

That will cause the following error:

File "/home/jens/Desktop/pathways.py", line 19, in <module>
   data = Entrez.read(handle)
File "/usr/lib/python2.7/dist-packages/Bio/Entrez/__init__.py", line 372, in read
   record = handler.read(handle)
File "/usr/lib/python2.7/dist-packages/Bio/Entrez/Parser.py", line 187, in read
   self.parser.ParseFile(handle)
File "/usr/lib/python2.7/dist-packages/Bio/Entrez/Parser.py", line 486, in externalEntityRefHandler
   self.dtd_urls.append(url)
UnboundLocalError: local variable 'url' referenced before assignment

I take a look into the Parser.py class and found this:

def externalEntityRefHandler(self, context, base, systemId, publicId):
    """The purpose of this function is to load the DTD locally, instead
    of downloading it from the URL specified in the XML. Using the local
    DTD results in much faster parsing. If the DTD is not found locally,
    we try to download it. If new DTDs become available from NCBI,
    putting them in Bio/Entrez/DTDs will allow the parser to see them."""
    urlinfo = _urlparse(systemId)
    #Following attribute requires Python 2.5+
    #if urlinfo.scheme=='http':
    if urlinfo[0]=='http':
        # Then this is an absolute path to the DTD.
        url = systemId
    elif urlinfo[0]=='':
        # Then this is a relative path to the DTD.
        # Look at the parent URL to find the full path.
        try:
            url = self.dtd_urls[-1]
        except IndexError:
            # Assume the default URL for DTDs if the top parent
            # does not contain an absolute path
            source = "http://www.ncbi.nlm.nih.gov/dtd/"
        else:
            source = os.path.dirname(url)
        # urls always have a forward slash, don't use os.path.join
        url = source.rstrip("/") + "/" + systemId
    self.dtd_urls.append(url)

I have done a little bit debugging and found the error. In my case, the urlinfo[0] contains "ftp". This case is not handled in the if/elif construct. And so the url parameter is not set.

Is this a bug in Biopython or do I handle it the wrong way?

Python Biopython • 3.7k views
ADD COMMENT
2
Entering edit mode
9.0 years ago
Peter 6.0k

It's a bug, reported about the same time as your question here:

https://github.com/biopython/biopython/issues/527

ADD COMMENT

Login before adding your answer.

Traffic: 1842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6