Question

Parsing error using Entrez.read (BioPython)

0

Entering edit mode

8.8 years ago

bojingjia ▴ 10

After finding the corresponding gene IDs to a list of gene symbols, I am trying to grab the gene summary for those gene IDs. When I use Entrez.read to parse the summaries I have grabbed (using Entrez.esummary), I get a weirdly-structured list/dictionary for which I can't make out the keys. For example, below I try to print out the values under OtherDesignations, and I get a key error. Can someone help me out?

import sys
from Bio import Entrez
import xlrd

Entrez.email = "john.doe@mail.com"

wb = xlrd.open_workbook('C:/Users/user/geneSymbolsTest.xlsx')
sh = wb.sheet_by_index(0)
colA = sh.col_values(0)
colA.pop(0)

symbol_list = []
for x in colA:
symbol_list.append(str(x))

id_list = []
summary = []
parsedSummary = []

for x in symbol_list:

    sterm = x + '[sym] "Mus musculus"[orgn]'
    handle = Entrez.esearch(db="gene", retmode = "xml", term = sterm )
    record = Entrez.read(handle)

    IDArray = record["IdList"]
    toString = str(IDArray[0])
    summary = Entrez.esummary(db="gene", retmode = "xml", id = toString)
    parsedSummary = Entrez.read(summary)
    entry = parsedSummary[0]["OtherDesignation"]
    print entry

biopython entrez gene • 2.0k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.8 years ago by bojingjia ▴ 10

0

Entering edit mode

Hi, can you please add an "x" example, in order to test your code as you did.

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by glihm ▴ 660

Ram · Answer 1 · 2015-08-16

Bojingjia,

To enter in the first dictionary which contains OtherDesignation, you did a little mistake:

entry = parsedSummary["DocumentSummarySet"]["DocumentSummary"][0]["OtherDesignations"]
print entry

When you have a complicated output like this, if you are looking for to extract the data from one specific field, you should try to identify each data structure which contains it. I am agree with you, sometimes it is a little bit... fastidious. Next point, you have to print each step to see the evolution of your data structure. It will be helpful to identify these ones.