Question

Entrez parse function failing to parse PubMed publication records

1

Entering edit mode

7.3 years ago

apiljic • 0

I am trying to use Entrez to search and parse publication records from PubMed. The parse function used to work until recently, but a few days ago it started failing. I started getting the following error:

File "/venv/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 296, in parse
raise ValueError("The XML file does not represent a list. Please use Entrez.read instead of Entrez.parse")
ValueError: The XML file does not represent a list. Please use Entrez.read instead of Entrez.parse

Looking at the source code (http://biopython.org/DIST/docs/api/Bio.Entrez-pysrc.html) and trying to follow the listed example, gives the same error.

from Bio import Entrez 
Entrez.email = "Your.Name.Here@example.org"
handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")    
records = Entrez.parse(handle) 
for record in records: 
    print(record['MedlineCitation']['Article']['ArticleTitle']) 
handle.close()

Does somebody have an idea what might be going wrong here?

biopython pubmed entrez • 3.1k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 7.3 years ago by apiljic • 0

0

Entering edit mode

The discussion by biopython developers at Github: https://github.com/biopython/biopython/issues/1027

ADD REPLY • link 7.3 years ago by apiljic • 0

score 0 · Answer 1 · 2016-12-22

0

Entering edit mode

7.3 years ago

WouterDeCoster 47k

The error message is quite clear. Instead of

records = Entrez.parse(handle)

you would need

records = Entrez.read(handle)

And that works. Although that's not what the example shows and not what I intuitively would expect.

ADD COMMENT • link 7.3 years ago by WouterDeCoster 47k

0

Entering edit mode

Yes, read indeed works. But parse is supposed to work with multiple records, so I'd like to keep using it if possible.

ADD REPLY • link 7.3 years ago by apiljic • 0

1

Entering edit mode

Then a try-except block could be useful:

from Bio import Entrez
Entrez.email = "Your.Name.Here@example.org"

try:
    handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")    
    records = Entrez.parse(handle) 
    for record in records: 
        print(record['MedlineCitation']['Article']['ArticleTitle']) 
except ValueError:
    handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")    
    records = Entrez.read(handle) 
    for record in records: 
        print(record) 
handle.close()

Clunky, ugly, but works.

ADD REPLY • link 7.3 years ago by WouterDeCoster 47k

0

Entering edit mode

I think I'll have to switch to read() then and see how that works. There is a bit more code that needs to change. The reason why I am hesitating is that parse() worked for ages without any issues for all records. If this issue now arises due to a planned change at PubMed, then all is fine, but it might also be some sort of bug they are not aware of yet.

ADD REPLY • link 7.3 years ago by apiljic • 0