Aliases for gene names from Entrez gene search
1
1
Entering edit mode
9.3 years ago
Harijay ▴ 60

I want to programmatically list all the aliases for a whole number of gene names. I am using Entrez/NCBI and the "gene" database to do this.

For eg:

When I search the ncbi gene database for XRCC1, I get the results

http://www.ncbi.nlm.nih.gov/gene/?term=XRCC1

In the browser NCBI gives you a nice table with the "Name", "Geneid" , "Location" and importantly "Aliases". How do I get these aliases from a Biopython or Entrez eutils query programmatically?

Currently I can get a bunch of information from a function like:

from Bio import Entrez

def main(infile_name):
    Entrez.email = "myemail@myworkplace.com"
    # How do I setup this query to get Aliases?
    query = Entrez.esearch(db="gene",term=input_term,retmode="text")
    # Process this result and extract Aliases?
    result = Entrez.read(query)
    print result

Thanks for your help

Entrez Biopython NCBI • 7.2k views
ADD COMMENT
7
Entering edit mode
9.3 years ago

You might try using the relatively new Entrez Direct tools that can be run from the command-line. Here is an example command for pulling the aliases for BRCA1[human].

esearch -db gene -query "(BRCA1[Preferred Symbol]) AND 9606[Taxonomy ID]" | \
  efetch -format xml | \
  xtract -pattern Entrezgene \
    -element Gene-ref_locus \
    -sep "|" -element Gene-ref_syn_E

This results in bar-separated aliases following the official symbol.

BRCA1    IRIS|PSCP|BRCAI|BRCC1|FANCS|PNCA4|RNF53|BROVCA1|PPP1R53
ADD COMMENT
0
Entering edit mode

Thanks for the pointer to Entrez Direct. I used the workflow you outlined to fetch all ids and then iterate through to find a match for the ID used in the list of aliases. Using code like

import json
from Bio import Entrez
Entrez.email = "myemail@myinstitution.org"

def main(gene_name):
    most_likely_entry = Entrez.esearch(db="gene",term="{gene_name} [Preferred Symbol] AND 9606 [Taxonomy ID]".format(gene_name=gene_name),retmode="json")
    most_likely_entry_json = json.loads(most_likely_entry.read())
    my_ids = most_likely_entry_json['esearchresult']['idlist']
    if my_ids == []:
        all_ids_entries =  Entrez.esearch(db="gene",term="{gene_name} AND 9606 [Taxonomy ID]".format(gene_name=gene_name),retmode="json")
        all_ids_json = json.loads(all_ids_entries.read())
        all_ids = all_ids_json['esearchresult']['idlist']
        # print all_ids
        for an_iden in all_ids:
            record_with_aliases = Entrez.efetch(db="gene",id=an_iden,retmode="json")
            aliases = []
            for line in record_with_aliases:
                if line.startswith("Official Symbol:"):
                    aliases.append(line.split("and Name")[0].split(":")[1])
                if line.startswith("Other Aliases:"):
                    for x in [y.strip() for y in [x.strip() for x in line.split(":")[1:]][0].split(",")]:
                        aliases.append(x)
                    # print aliases
                    if gene_name in aliases:
                        return aliases[0]
                    elif gene_name.replace(" ","") in aliases:
                        return aliases[0]
                    elif gene_name.replace("-","") in aliases:
                        return aliases[0]
                    else:
                        continue
        return "NOT FOUND"
    else:
        # We got the ID with the Preferred Symbol lookup
        print "Got ID:",my_ids
        for an_iden in my_ids:
            record_with_aliases = Entrez.efetch(db="gene",id=an_iden,retmode="json")
            for line in record_with_aliases:
                if line.startswith("Other Aliases:"):
                    print line.split(":")[1:]
            entry = record_with_aliases.read()
            return entry
ADD REPLY
0
Entering edit mode

edirect is a nice tool

ADD REPLY

Login before adding your answer.

Traffic: 1987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6