Question: Aliases for gene names from Entrez gene search
gravatar for Harijay
5.1 years ago by
Cambridge, MA
Harijay60 wrote:

I want to programmatically list all the aliases for a whole number of gene names. I am using Entrez/NCBI and the "gene" database to do this.

For eg:

When I search the ncbi gene database  for XRCC1 , I get the results

In the browser NCBI ives you a nice table with the "Name", "Geneid" , "Location" and importantly "Aliases". How do I get these aliases from a Biopython or Entrez eutils query programmatically?

Currently I can get a bunch of information from a function like:

from Bio import Entrez

def main(infile_name): = ""
    # How do I setup this query to get Aliases?
    query = Entrez.esearch(db="gene",term=input_term,retmode="text")
    # Process this result and extract Aliases?
    result =
    print result

Thanks for your help


biopython entrez ncbi • 4.3k views
ADD COMMENTlink modified 5.1 years ago by Sean Davis25k • written 5.1 years ago by Harijay60
gravatar for Sean Davis
5.1 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

You might try using the relatively new Entrez Direct tools that can be run from the command-line.  Here is an example command for pulling the aliases for BRCA1[human].

esearch -db gene -query "(BRCA1[Preferred Symbol]) AND 9606[Taxonomy ID]" | \
  efetch -format xml | \
  xtract -pattern Entrezgene \
    -element Gene-ref_locus \
    -sep "|" -element Gene-ref_syn_E

This results in bar-separated aliases following the official symbol.

ADD COMMENTlink written 5.1 years ago by Sean Davis25k

Thanks for the pointer to Entrez Direct. I used the workflow you outlined to fetch all ids and then iterate through to find a match for the ID used in the list of aliases. Using code like

import json
from Bio import Entrez = ""

def main(gene_name):
    most_likely_entry = Entrez.esearch(db="gene",term="{gene_name} [Preferred Symbol] AND 9606 [Taxonomy ID]".format(gene_name=gene_name),retmode="json")
    most_likely_entry_json = json.loads(
    my_ids = most_likely_entry_json['esearchresult']['idlist']
    if my_ids == []:
        all_ids_entries =  Entrez.esearch(db="gene",term="{gene_name} AND 9606 [Taxonomy ID]".format(gene_name=gene_name),retmode="json")
        all_ids_json = json.loads(
        all_ids = all_ids_json['esearchresult']['idlist']
        # print all_ids
        for an_iden in all_ids:
            record_with_aliases = Entrez.efetch(db="gene",id=an_iden,retmode="json")
            aliases = []
            for line in record_with_aliases:
                if line.startswith("Official Symbol:"):
                    aliases.append(line.split("and Name")[0].split(":")[1])
                if line.startswith("Other Aliases:"):
                    for x in [y.strip() for y in [x.strip() for x in line.split(":")[1:]][0].split(",")]:
                    # print aliases
                    if gene_name in aliases:
                        return aliases[0]
                    elif gene_name.replace(" ","") in aliases:
                        return aliases[0]
                    elif gene_name.replace("-","") in aliases:
                        return aliases[0]
        return "NOT FOUND"
        # We got the ID with the Preferred Symbol lookup
        print "Got ID:",my_ids
        for an_iden in my_ids:
            record_with_aliases = Entrez.efetch(db="gene",id=an_iden,retmode="json")
            for line in record_with_aliases:
                if line.startswith("Other Aliases:"):
                    print line.split(":")[1:]
            entry =
            return entry


ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Harijay60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2242 users visited in the last hour