Biopython Class Instance - Output From Entrez.Read: I Don'T Know How To Manipulate The Output
1
0
Entering edit mode
12.3 years ago

I am trying to download some xml from Pubmed - no problems there, Biopython is great. The problem is that I do not really know how to manipulate the output. I want to put most of the parsed xml into a sql database, but I'm not familiar with the output. For some things I can call the parsed xml like a dictionary, but for others it doesn't seem that straight forward.

from Bio import Entrez
Entrez.email="xxxxxxxxxxxxx@gmail.com"
import sqlite3 as lite
handle=Entrez.efetch(db='pubmed',id='22737229', retmode='xml')
record = Entrez.read(handle)

If I want to find the title I can do this:

title=record[0]['MedlineCitation']['Article']['ArticleTitle']

But the type of the parsed object is a class:

>>> type(record)
<class 'Bio.Entrez.Parser.ListElement'>
>>>r=record[0]
>>>type(r)
<class 'Bio.Entrez.Parser.DictionaryElement'>
>>> r.keys()
[u'MedlineCitation', u'PubmedData']

Which makes me think there must be a much easier way of doing this than using it as a dictionary. But when I try:

>>> r.MedlineCitation

Traceback (most recent call last):
  File "<pyshell#67>", line 1, in <module>
    r.MedlineCitation
AttributeError: 'DictionaryElement' object has no attribute 'MedlineCitation'

It doesn't work. I can obviously use it as a dictionary, but then I run into problems later.

The real problem is trying to get certain information from the record when using it like a dictionary:

>>> record[0]['MedlineCitation']['PMID']
StringElement('22737229', attributes={u'Version': u'1'})

Which means that I can't just plop (that's a technical term ;) it into my sql database but need to convert it:

>>> t=record[0]['MedlineCitation']['PMID']
>>> t
StringElement('22737229', attributes={u'Version': u'1'})
>>> int(t)
22737229
>>> str(t)
'22737229'

All in all I am glad for the depth of information that Entrez.read() provides but I am not sure how to easily use the information in the resulting class instance.

Cheers

Wheaton

biopython • 4.9k views
ADD COMMENT
0
Entering edit mode

I would use XSLT to insert pubmed-xml into a database, not python.

ADD REPLY
0
Entering edit mode

Sadly I have no knowledge of XSLT, but I've used python and the sqlite3 module before.

ADD REPLY
0
Entering edit mode

I can show a quick example , what's the schema of your database ?

ADD REPLY
0
Entering edit mode
12.3 years ago
bow ▴ 790

The elements you see there are subclasses of Python's built-in types. You can use this information to check the type and perform any necessary casting prior to storing them in your database.

>>> pmid = record[0]['MedlineCitation']['PMID']
>>> pmid.__class__.__bases__
(<type 'str'>,)
>>> isinstance(pmid, basestring)
True

By the way, what do you mean by "I can obviously use it as a dictionary, but then I run into problems later."? The r[0] object you're using is a subclass of Python's built-in dictionary (in similar flavor to the example I've shown above), so it should be straightforward to treat it as a dictionary.

EDIT: There seems to be something wrong with the display. The line after pmid.__class__.__bases__ above is supposed to show (<type 'str',)

ADD COMMENT
0
Entering edit mode

what I mean by "problems later" is this piece here:

>>> t=record[0]['MedlineCitation']['PMID'] 
>>> t
StringElement('22737229', attributes={u'Version': u'1'})
>>> int(t)
22737229
>>> str(t)
'22737229'

I can't just dump the value into sql because of the attribute

ADD REPLY
0
Entering edit mode

Hmm..you can convert them based on the parent class into a built-in Python type, right?

ADD REPLY
0
Entering edit mode

You've kinda already solved the problem. You can recast the object as scalar ints or strings and dump that into the sql database.

ADD REPLY
0
Entering edit mode

True. I can do it, it just isn't pretty. I thought maybe since someone took the time to put it in this structure there must be a really easy way of dealing with it. Maybe not though

ADD REPLY

Login before adding your answer.

Traffic: 1665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6