Entrez parse function failing to parse PubMed publication records
1
1
Entering edit mode
7.3 years ago
apiljic • 0

I am trying to use Entrez to search and parse publication records from PubMed. The parse function used to work until recently, but a few days ago it started failing. I started getting the following error:

File "/venv/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 296, in parse
raise ValueError("The XML file does not represent a list. Please use Entrez.read instead of Entrez.parse")
ValueError: The XML file does not represent a list. Please use Entrez.read instead of Entrez.parse

Looking at the source code (http://biopython.org/DIST/docs/api/Bio.Entrez-pysrc.html) and trying to follow the listed example, gives the same error.

from Bio import Entrez 
Entrez.email = "Your.Name.Here@example.org"
handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")    
records = Entrez.parse(handle) 
for record in records: 
    print(record['MedlineCitation']['Article']['ArticleTitle']) 
handle.close()

Does somebody have an idea what might be going wrong here?

biopython pubmed entrez • 3.1k views
ADD COMMENT
0
Entering edit mode

The discussion by biopython developers at Github: https://github.com/biopython/biopython/issues/1027

ADD REPLY
0
Entering edit mode
7.3 years ago

The error message is quite clear. Instead of

records = Entrez.parse(handle)

you would need

records = Entrez.read(handle)

And that works. Although that's not what the example shows and not what I intuitively would expect.

ADD COMMENT
0
Entering edit mode

Yes, read indeed works. But parse is supposed to work with multiple records, so I'd like to keep using it if possible.

ADD REPLY
1
Entering edit mode

Then a try-except block could be useful:

from Bio import Entrez
Entrez.email = "Your.Name.Here@example.org"

try:
    handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")    
    records = Entrez.parse(handle) 
    for record in records: 
        print(record['MedlineCitation']['Article']['ArticleTitle']) 
except ValueError:
    handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")    
    records = Entrez.read(handle) 
    for record in records: 
        print(record) 
handle.close()

Clunky, ugly, but works.

ADD REPLY
0
Entering edit mode

I think I'll have to switch to read() then and see how that works. There is a bit more code that needs to change. The reason why I am hesitating is that parse() worked for ages without any issues for all records. If this issue now arises due to a planned change at PubMed, then all is fine, but it might also be some sort of bug they are not aware of yet.

ADD REPLY

Login before adding your answer.

Traffic: 1927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6