Question: Retrieve gene names from gene symbols using Entrez?
0
gravatar for randalljellis
3.2 years ago by
randalljellis70 wrote:

Hello,

Is it possible to use any of the Entrez tools to query a gene symbol and retrieve the gene name? As in, query "DRD1" and retrieve "dopamine receptor D1" as a result. If it can't be done with Entrez but can be some other way, I would gladly follow that too!

Thank you in advance, you're all BioStars :)

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by randalljellis70
2
gravatar for WouterDeCoster
3.2 years ago by
Belgium
WouterDeCoster40k wrote:

As very very often, the answer is probably Ensembl's biomart :)

ADD COMMENTlink written 3.2 years ago by WouterDeCoster40k
2
gravatar for genomax
3.2 years ago by
genomax70k
United States
genomax70k wrote:

It can probably be done using eUtils or by parsing the correct gene name (there are more than one DRD1 genes) from this file.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by genomax70k
2
gravatar for Ming Tang
3.2 years ago by
Ming Tang2.5k
Houston/MD Anderson Cancer Center
Ming Tang2.5k wrote:

I have blog posts on this: http://crazyhottommy.blogspot.com/2014/09/converting-gene-ids-using-bioconductor.html http://crazyhottommy.blogspot.com/2014/09/mapping-gene-ids-with-mygene.html

ADD COMMENTlink written 3.2 years ago by Ming Tang2.5k
2
gravatar for randalljellis
3.2 years ago by
randalljellis70 wrote:

Thanks guys! Here's the program I ended up writing in case anyone wants to do the same thing!

def gene_alias(gene_file):
from Bio import Entrez
import csv

Entrez.email = "your email"
genes = [gene.rstrip('\n') for gene in open(gene_file)]

ids = []
aliases = []

for gene in genes:
    #retrieve gene ID
    handle = Entrez.esearch(db="gene", term="Mus musculus[Orgn] AND " + gene + "[Gene]")
    record = Entrez.read(handle)

    if len(record["IdList"]) > 0:
        ids.append(record["IdList"][0])

        #retrieve aliases
        record_with_aliases = Entrez.efetch(db="gene",id=record["IdList"][0],retmode="json")
        entry = record_with_aliases.read()
        entry_lines = entry.splitlines()
        for i in range(len(entry_lines)):
            while 'This record was replaced with GeneID:' in entry_lines[i]:
               new_id = entry_lines[i][38:]
               record_with_aliases = Entrez.efetch(db="gene",id=new_id ,retmode="json")
               entry = record_with_aliases.read()
               entry_lines = entry.splitlines()


        firstline = entry.splitlines()[1]
        if gene.lower() == firstline[3:].lower():
            thirdline = entry.splitlines()[3]
            fourthline = entry.splitlines()[4]
            if thirdline[0:13] == 'Other Aliases':
                aliases.append(thirdline[15:])
            elif fourthline == 'This record was discontinued.':
                aliases.append(fourthline)
            else:
                aliases.append('no aliases')
        else:
            aliases.append(firstline[3:])

    else:
        ids.append(gene + ' is not in Gene')
        aliases.append(gene + ' is not in Gene')

rows = zip(genes, ids, aliases)
with open('gene_aliases.csv', 'wb') as thefile:
    writer = csv.writer(thefile)
    writer.writerow(['Gene', 'ID', 'Aliases'])
    for row in rows:
        writer.writerow(row)
ADD COMMENTlink written 3.2 years ago by randalljellis70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour