Question: Entrez parse function failing to parse PubMed publication records
1
gravatar for apiljic
3.5 years ago by
apiljic0
apiljic0 wrote:

I am trying to use Entrez to search and parse publication records from PubMed. The parse function used to work until recently, but a few days ago it started failing. I started getting the following error:

File "/venv/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 296, in parse raise ValueError("The XML file does not represent a list. Please use Entrez.read instead of Entrez.parse") ValueError: The XML file does not represent a list. Please use Entrez.read instead of Entrez.parse

Looking at the source code (http://biopython.org/DIST/docs/api/Bio.Entrez-pysrc.html) and trying to follow the listed example, gives the same error.

from Bio import Entrez 
Entrez.email = "Your.Name.Here@example.org"
handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")    
records = Entrez.parse(handle) 
for record in records: 
    print(record['MedlineCitation']['Article']['ArticleTitle']) 
handle.close()

Does somebody have an idea what might be going wrong here?

pubmed entrez biophyton • 1.5k views
ADD COMMENTlink modified 3.5 years ago by WouterDeCoster43k • written 3.5 years ago by apiljic0

The discussion by biopython developers at Github: https://github.com/biopython/biopython/issues/1027

ADD REPLYlink written 3.4 years ago by apiljic0
0
gravatar for WouterDeCoster
3.5 years ago by
Belgium
WouterDeCoster43k wrote:

The error message is quite clear. Instead of

records = Entrez.parse(handle)

you would need

records = Entrez.read(handle)

And that works. Although that's not what the example shows and not what I intuitively would expect.

ADD COMMENTlink written 3.5 years ago by WouterDeCoster43k

Yes, read indeed works. But parse is supposed to work with multiple records, so I'd like to keep using it if possible.

ADD REPLYlink written 3.5 years ago by apiljic0
1

Then a try-except block could be useful:

from Bio import Entrez
Entrez.email = "Your.Name.Here@example.org"

try:
    handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")    
    records = Entrez.parse(handle) 
    for record in records: 
        print(record['MedlineCitation']['Article']['ArticleTitle']) 
except ValueError:
    handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")    
    records = Entrez.read(handle) 
    for record in records: 
        print(record) 
handle.close()

Clunky, ugly, but works.

ADD REPLYlink written 3.5 years ago by WouterDeCoster43k

I think I'll have to switch to read() then and see how that works. There is a bit more code that needs to change. The reason why I am hesitating is that parse() worked for ages without any issues for all records. If this issue now arises due to a planned change at PubMed, then all is fine, but it might also be some sort of bug they are not aware of yet.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by apiljic0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 856 users visited in the last hour