I am trying to download some xml from Pubmed - no problems there, Biopython is great. The problem is that I do not really know how to manipulate the output. I want to put most of the parsed xml into a sql database, but I'm not familiar with the output. For some things I can call the parsed xml like a dictionary, but for others it doesn't seem that straight forward.
from Bio import Entrez
Entrez.email="xxxxxxxxxxxxx@gmail.com"
import sqlite3 as lite
handle=Entrez.efetch(db='pubmed',id='22737229', retmode='xml')
record = Entrez.read(handle)
If I want to find the title I can do this:
title=record[0]['MedlineCitation']['Article']['ArticleTitle']
But the type of the parsed object is a class:
>>> type(record)
<class 'Bio.Entrez.Parser.ListElement'>
>>>r=record[0]
>>>type(r)
<class 'Bio.Entrez.Parser.DictionaryElement'>
>>> r.keys()
[u'MedlineCitation', u'PubmedData']
Which makes me think there must be a much easier way of doing this than using it as a dictionary. But when I try:
>>> r.MedlineCitation
Traceback (most recent call last):
File "<pyshell#67>", line 1, in <module>
r.MedlineCitation
AttributeError: 'DictionaryElement' object has no attribute 'MedlineCitation'
It doesn't work. I can obviously use it as a dictionary, but then I run into problems later.
The real problem is trying to get certain information from the record when using it like a dictionary:
>>> record[0]['MedlineCitation']['PMID']
StringElement('22737229', attributes={u'Version': u'1'})
Which means that I can't just plop (that's a technical term ;) it into my sql database but need to convert it:
>>> t=record[0]['MedlineCitation']['PMID']
>>> t
StringElement('22737229', attributes={u'Version': u'1'})
>>> int(t)
22737229
>>> str(t)
'22737229'
All in all I am glad for the depth of information that Entrez.read() provides but I am not sure how to easily use the information in the resulting class instance.
Cheers
Wheaton
I would use XSLT to insert pubmed-xml into a database, not python.
Sadly I have no knowledge of XSLT, but I've used python and the sqlite3 module before.
I can show a quick example , what's the schema of your database ?