Question: Biopython Class Instance - Output From Entrez.Read: I Don'T Know How To Manipulate The Output
0
gravatar for Wheaton Little
8.2 years ago by
Wheaton Little0 wrote:

I am trying to download some xml from Pubmed - no problems there, Biopython is great. The problem is that I do not really know how to manipulate the output. I want to put most of the parsed xml into a sql database, but I'm not familiar with the output. For some things I can call the parsed xml like a dictionary, but for others it doesn't seem that straight forward.

from Bio import Entrez
Entrez.email="xxxxxxxxxxxxx@gmail.com"
import sqlite3 as lite
handle=Entrez.efetch(db='pubmed',id='22737229', retmode='xml')
record = Entrez.read(handle)

If I want to find the title I can do this:

title=record[0]['MedlineCitation']['Article']['ArticleTitle']

But the type of the parsed object is a class:

>>> type(record)
<class 'Bio.Entrez.Parser.ListElement'>
>>>r=record[0]
>>>type(r)
<class 'Bio.Entrez.Parser.DictionaryElement'>
>>> r.keys()
[u'MedlineCitation', u'PubmedData']

Which makes me think there must be a much easier way of doing this than using it as a dictionary. But when I try:

>>> r.MedlineCitation

Traceback (most recent call last):
  File "<pyshell#67>", line 1, in <module>
    r.MedlineCitation
AttributeError: 'DictionaryElement' object has no attribute 'MedlineCitation'

It doesn't work. I can obviously use it as a dictionary, but then I run into problems later.

The real problem is trying to get certain information from the record when using it like a dictionary:

>>> record[0]['MedlineCitation']['PMID']
StringElement('22737229', attributes={u'Version': u'1'})

Which means that I can't just plop (that's a technical term ;) it into my sql database but need to convert it:

>>> t=record[0]['MedlineCitation']['PMID']
>>> t
StringElement('22737229', attributes={u'Version': u'1'})
>>> int(t)
22737229
>>> str(t)
'22737229'

All in all I am glad for the depth of information that Entrez.read() provides but I am not sure how to easily use the information in the resulting class instance.

Cheers

Wheaton

biopython • 3.2k views
ADD COMMENTlink written 8.2 years ago by Wheaton Little0

I would use XSLT to insert pubmed-xml into a database, not python.

ADD REPLYlink written 8.2 years ago by Pierre Lindenbaum130k

Sadly I have no knowledge of XSLT, but I've used python and the sqlite3 module before.

ADD REPLYlink written 8.2 years ago by Wheaton Little0

I can show a quick example , what's the schema of your database ?

ADD REPLYlink written 8.2 years ago by Pierre Lindenbaum130k
0
gravatar for bow
8.2 years ago by
bow790
Netherlands
bow790 wrote:

The elements you see there are subclasses of Python's built-in types. You can use this information to check the type and perform any necessary casting prior to storing them in your database.

>>> pmid = record[0]['MedlineCitation']['PMID']
>>> pmid.__class__.__bases__
(<type 'str'>,)
>>> isinstance(pmid, basestring)
True

By the way, what do you mean by "I can obviously use it as a dictionary, but then I run into problems later."? The r[0] object you're using is a subclass of Python's built-in dictionary (in similar flavor to the example I've shown above), so it should be straightforward to treat it as a dictionary.

EDIT: There seems to be something wrong with the display. The line after pmid.__class__.__bases__ above is supposed to show (<type 'str',)

ADD COMMENTlink modified 8.2 years ago • written 8.2 years ago by bow790

what I mean by "problems later" is this piece here:

>>> t=record[0]['MedlineCitation']['PMID'] 
>>> t
StringElement('22737229', attributes={u'Version': u'1'})
>>> int(t)
22737229
>>> str(t)
'22737229'

I can't just dump the value into sql because of the attribute

ADD REPLYlink written 8.2 years ago by Wheaton Little0

Hmm..you can convert them based on the parent class into a built-in Python type, right?

ADD REPLYlink written 8.2 years ago by bow790

You've kinda already solved the problem. You can recast the object as scalar ints or strings and dump that into the sql database.

ADD REPLYlink written 8.2 years ago by Damian Kao15k

True. I can do it, it just isn't pretty. I thought maybe since someone took the time to put it in this structure there must be a really easy way of dealing with it. Maybe not though

ADD REPLYlink written 8.2 years ago by Wheaton Little0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2051 users visited in the last hour