Question

Aliases for gene names from Entrez gene search

1

Entering edit mode

9.3 years ago

Harijay ▴ 60

I want to programmatically list all the aliases for a whole number of gene names. I am using Entrez/NCBI and the "gene" database to do this.

For eg:

When I search the ncbi gene database for XRCC1, I get the results

http://www.ncbi.nlm.nih.gov/gene/?term=XRCC1

In the browser NCBI gives you a nice table with the "Name", "Geneid" , "Location" and importantly "Aliases". How do I get these aliases from a Biopython or Entrez eutils query programmatically?

Currently I can get a bunch of information from a function like:

from Bio import Entrez

def main(infile_name):
    Entrez.email = "myemail@myworkplace.com"
    # How do I setup this query to get Aliases?
    query = Entrez.esearch(db="gene",term=input_term,retmode="text")
    # Process this result and extract Aliases?
    result = Entrez.read(query)
    print result

Thanks for your help

Entrez Biopython NCBI • 7.2k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Harijay ▴ 60

Ram · Accepted Answer · 2015-01-10

7

Entering edit mode

9.3 years ago

Sean Davis 26k

You might try using the relatively new Entrez Direct tools that can be run from the command-line. Here is an example command for pulling the aliases for BRCA1[human].

esearch -db gene -query "(BRCA1[Preferred Symbol]) AND 9606[Taxonomy ID]" | \
  efetch -format xml | \
  xtract -pattern Entrezgene \
    -element Gene-ref_locus \
    -sep "|" -element Gene-ref_syn_E

This results in bar-separated aliases following the official symbol.

BRCA1    IRIS|PSCP|BRCAI|BRCC1|FANCS|PNCA4|RNF53|BROVCA1|PPP1R53

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Sean Davis 26k

0

Entering edit mode

Thanks for the pointer to Entrez Direct. I used the workflow you outlined to fetch all ids and then iterate through to find a match for the ID used in the list of aliases. Using code like

import json
from Bio import Entrez
Entrez.email = "myemail@myinstitution.org"

def main(gene_name):
    most_likely_entry = Entrez.esearch(db="gene",term="{gene_name} [Preferred Symbol] AND 9606 [Taxonomy ID]".format(gene_name=gene_name),retmode="json")
    most_likely_entry_json = json.loads(most_likely_entry.read())
    my_ids = most_likely_entry_json['esearchresult']['idlist']
    if my_ids == []:
        all_ids_entries =  Entrez.esearch(db="gene",term="{gene_name} AND 9606 [Taxonomy ID]".format(gene_name=gene_name),retmode="json")
        all_ids_json = json.loads(all_ids_entries.read())
        all_ids = all_ids_json['esearchresult']['idlist']
        # print all_ids
        for an_iden in all_ids:
            record_with_aliases = Entrez.efetch(db="gene",id=an_iden,retmode="json")
            aliases = []
            for line in record_with_aliases:
                if line.startswith("Official Symbol:"):
                    aliases.append(line.split("and Name")[0].split(":")[1])
                if line.startswith("Other Aliases:"):
                    for x in [y.strip() for y in [x.strip() for x in line.split(":")[1:]][0].split(",")]:
                        aliases.append(x)
                    # print aliases
                    if gene_name in aliases:
                        return aliases[0]
                    elif gene_name.replace(" ","") in aliases:
                        return aliases[0]
                    elif gene_name.replace("-","") in aliases:
                        return aliases[0]
                    else:
                        continue
        return "NOT FOUND"
    else:
        # We got the ID with the Preferred Symbol lookup
        print "Got ID:",my_ids
        for an_iden in my_ids:
            record_with_aliases = Entrez.efetch(db="gene",id=an_iden,retmode="json")
            for line in record_with_aliases:
                if line.startswith("Other Aliases:"):
                    print line.split(":")[1:]
            entry = record_with_aliases.read()
            return entry

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Harijay ▴ 60

0

Entering edit mode

edirect is a nice tool

ADD REPLY • link 3.3 years ago by gsr9999 ▴ 300