Annotating gene symbols with BioPython
0
0
Entering edit mode
8.8 years ago
bojingjia ▴ 10

I have a list of gene symbols, rather than IDs, and I would like to use Entrez to annotate my genes (i.e., grabbing gene summary). I've seen example codes to annotate genes by gene IDs, and I figured it would be easy to search by gene symbols, but I'm having a lot of trouble debugging my code. Can anyone point me to the right direction?

Thanks in advance!

EDIT: I posted my code below. I think this is a really slow and dumb way of achieving what I want - the annotation currently prints out OK, still working on writing it to an excel spreadsheet. I've come across the esummary function on Entrez, and I was wondering if that may be a faster method?

annotation gene Entrez BioPython • 3.1k views
ADD COMMENT
1
Entering edit mode

post your code..

ADD REPLY
0
Entering edit mode
import sys
 
from Bio import Entrez
import xlrd


# *Always* tell NCBI who you are
Entrez.email = "john.doe@mail.com"
 
def retrieve_annotation(id_list):
 
    """Annotates Entrez Gene IDs using Bio.Entrez, in particular epost (to
    submit the data to NCBI) and esummary to retrieve the information. 
    Returns a list of dictionaries with the annotations."""
 
    # This below tests for search by gene symbol
    request = Entrez.epost("gene",id=",".join(id_list))
    try:
        result = Entrez.read(request)
    except RuntimeError as e:
        #FIXME: How generate NAs instead of causing an error with invalid IDs?
        print "An error occurred while retrieving the annotations."
        print "The error returned was %s" % e
        sys.exit(-1)
 
    webEnv = result["WebEnv"]
    queryKey = result["QueryKey"]
    data = Entrez.esummary(db="gene", webenv=webEnv, query_key =
            queryKey)
    annotations = Entrez.read(data)
 
    print "Retrieved %d annotations for %d genes" % (len(annotations),
            len(id_list)) 
    return annotations

#Read data from Excel Spreadsheet
wb = xlrd.open_workbook('C:/Users/user/geneSymbolsTest.xlsx')
sh = wb.sheet_by_index(0)
colA = sh.col_values(0)
colA.pop(0)

#Convert entries to Strings
symbol_list = []
for x in colA:
    symbol_list.append(str(x))

#Search for Gene ID, then find annotation
id_list = []
for x in symbol_list:
    sterm = x + '[sym] "Mus musculus"[orgn]'
    handle = Entrez.esearch(db="gene", retmode = "xml", term = sterm )
    record = Entrez.read(handle)
    IDArray = record["IdList"]
    toString = str(IDArray[0])
    id_list.append(toString)

annotation = retrieve_annotation(id_list)
print type(annotation)
ADD REPLY

Login before adding your answer.

Traffic: 3926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6