I have many gene names I'm trying to map to entrez ids.
Right now I use the esearch module in biopython to query them 1 by 1 but this takes some time for 30000 gene names and ideally I would like it to be faster. I assume it would be faster if I could query 30000 at once instead of doing 30000 queries.
This is my current implementation:
for line in f.readlines(): line = [lineitem.strip('"') for lineitem in line.strip().split()] gene = line # Search NCBI for existing gene ids gene_id = None handle = Entrez.esearch(db="gene", term="Homo sapiens[orgn] AND "+ gene + "[Gene Name]") record = Entrez.read(handle) try: gene_id = record["IdList"] except: pass handle.close()
this works but I would like a better solution. Is there a better way to approach this?
Kind regards, Julian