I have a list of gene symbols, rather than IDs, and I would like to use Entrez to annotate my genes (i.e., grabbing gene summary). I've seen example codes to annotate genes by gene IDs, and I figured it would be easy to search by gene symbols, but I'm having a lot of trouble debugging my code. Can anyone point me to the right direction?
Thanks in advance!
EDIT: I posted my code below. I think this is a really slow and dumb way of achieving what I want - the annotation currently prints out OK, still working on writing it to an excel spreadsheet. I've come across the esummary function on Entrez, and I was wondering if that may be a faster method?
post your code..
import sys from Bio import Entrez import xlrd # *Always* tell NCBI who you are Entrez.email = "john.doe@mail.com" def retrieve_annotation(id_list): """Annotates Entrez Gene IDs using Bio.Entrez, in particular epost (to submit the data to NCBI) and esummary to retrieve the information. Returns a list of dictionaries with the annotations.""" # This below tests for search by gene symbol request = Entrez.epost("gene",id=",".join(id_list)) try: result = Entrez.read(request) except RuntimeError as e: #FIXME: How generate NAs instead of causing an error with invalid IDs? print "An error occurred while retrieving the annotations." print "The error returned was %s" % e sys.exit(-1) webEnv = result["WebEnv"] queryKey = result["QueryKey"] data = Entrez.esummary(db="gene", webenv=webEnv, query_key = queryKey) annotations = Entrez.read(data) print "Retrieved %d annotations for %d genes" % (len(annotations), len(id_list)) return annotations #Read data from Excel Spreadsheet wb = xlrd.open_workbook('C:/Users/user/geneSymbolsTest.xlsx') sh = wb.sheet_by_index(0) colA = sh.col_values(0) colA.pop(0) #Convert entries to Strings symbol_list = [] for x in colA: symbol_list.append(str(x)) #Search for Gene ID, then find annotation id_list = [] for x in symbol_list: sterm = x + '[sym] "Mus musculus"[orgn]' handle = Entrez.esearch(db="gene", retmode = "xml", term = sterm ) record = Entrez.read(handle) IDArray = record["IdList"] toString = str(IDArray[0]) id_list.append(toString) annotation = retrieve_annotation(id_list) print type(annotation)