How To Retrieve Gene Name And Species Associated With A Ensembl Protein Id In R Or Biomart
Entering edit mode
13.4 years ago
Jp ▴ 40

Given an ensembl protein id, I would like to be able to retrieve the associated gene name without explicit knowledge of what ensembl database to query--is this possible, and if so how could I implement this in R, the perl-api, or via a biomart query?

For example, to find information about the id ENSORLP00000023599, it would be trivial to ask the ensembl web interface to search all species in order to pull up the relevant information. But, is there a way I can do this programatically (without making a query for each species)?


api ensembl biomart r • 6.5k views
Entering edit mode

The ensembl web-site can do it, as you say. So there must be a way to do it programmatically. Maybe one could try to find out how the web-site does it. The result URL looks like a DAS query, but I would bet that the perl-API has a method to run such a query too. I'm not sure how this works though.

Entering edit mode

Michael, I thought this as well and tried to dig into it. The closest was the gene_autocomplete table in the ensembl_website_60 database, but this only contains mappings from gene names to organisms and doesn't support proteins. My guess would be that they have some type of full text search index supporting those queries.

Entering edit mode
13.4 years ago

There is a way to do this with the perl-api (core). You can use the get_species_and_object_type method in the registry, and you do not have to know the species. This will work with any Ensembl stable ID (ENSORLP00000023599 would work).

Here's the code:

my $stable_id = 'ENST00000326632';
my ( $species, $object_type, $db_type ) =
my $adaptor =
  $registry->get_adaptor( $species, $db_type, $object_type );
my $object = $adaptor->fetch_by_stable_id($stable_id);

After this, go in with the object and get the display label for the associated gene name.

Perl docs are here.

A tutorial, if you need it, is here

Don't forget, any queries like this can go directly to us (

Entering edit mode
13.4 years ago

As far as I can see, there is no web service around that can do exactly what you want. If you know the species of interest, you could quite easily retrieve it from BioMart using a simple XML formatted query:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query  virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >

    <Dataset name = "olatipes_gene_ensembl" interface = "default" >
        <Filter name = "ensembl_peptide_id" value = "ENSORLP00000023599"/>
        <Attribute name = "external_gene_id" />
        <Attribute name = "ensembl_peptide_id" />

You can send this query to their web service using a simple GET request if you like:<Query virtualSchemaName="default" formatter="TSV" header="0" uniqueRows="0" count="" datasetConfigVersion="0.6"><Dataset name="olatipes_gene_ensembl" interface="default"><Filter name="ensembl_peptide_id" value="ENSORLP00000023599"/><Attribute name="external_gene_id"/><Attribute name="ensembl_peptide_id"/></Dataset></Query>

One solution is thus that you make a simple lookup table where based on the letters before the numbers (in this case ENSORLP) you find out which data set in BioMart to query. This is obviously a pain since you would have to keep it up-to-date with new versions of Ensembl.

The other way to go about it is to grudgingly accept that the web services cannot do what the web interface can, and thus to make your script access the web interface. Yes it is ugly, but it gets the job done.

In your example, you can perform the web interface query by requesting this URL.

In the resulting HTML, you should identify a section like this:

<table class="search_results"> 
<tr><th colspan="2">By Feature type</th></tr> 
<tr><td><a href="/Multi/Search/Details?species=all;idx=;q=ENSORLP00000023599">Total</a></td><td><a href="/Multi/Search/Details?species=all;idx=;q=ENSORLP00000023599">1</a></td></tr> 
<td><a href="/Multi/Search/Details?_C=eJyLz2FIzWOIL8tjSElNSyzNKWGIL2Rw9Qv2D*IJMAADI2NTS0uF5PyigvyixJJU*ZKi1FQrpZD8Av3g*NKi5FT9VDMDJYb4jMwSt9KcHAZDAwYARHIZlw__&_c=%2b15927579347680844937" class="collapsible"><img src="/i/list_shut.gif" alt="&gt;" style="padding-right:4px" />Gene</a> 
<ul class="shut"> 
<li><a href="/Multi/Search/Details?_C=eJyLz2FIzWOIL8tjSElNSyzNKWGIL2Rw9Qv2D*IJMAADI2NTS0uF5PyigvyixJJU*ZKi1FQrpZD8Av3g*NKi5FT9VDMDJYb4jMwSt9KcHAZDAwYARHIZlw__&_c=%2b15927579347680844937&_c=%2b10649513383310789308">Oryzias latipes (1)</a></li></ul> 
<td style="width:5em"><a href="/Multi/Search/Details?_C=eJyLz2FIzWOIL8tjSElNSyzNKWGIL2Rw9Qv2D*IJMAADI2NTS0uF5PyigvyixJJU*ZKi1FQrpZD8Av3g*NKi5FT9VDMDJYb4jMwSt9KcHAZDAwYARHIZlw__&_c=%2b15927579347680844937">1</a> 

From this, you would extract the last URL mentioned:*IJMAADI2NTS0uF5PyigvyixJJU*ZKi1FQrpZD8Av3g*NKi5FT9VDMDJYb4jMwSt9KcHAZDAwYARHIZlw__&_c=%2b15927579347680844937

Retrieving that page, will yield you some HTML inside which you look for <p>Your query matched 1 entries in the search database</p>, after which you'll find the link to the next page to retrieve:;r=scaffold676:104884-110194;t=ENSORLT00000023600

Inside this HTML file you pull out the part that looks like <h2 class="caption">Gene: HRAS (ENSORLG00000018912)</h2>, which will give you the gene name for your identifier.

This solution is obviously a pain to implement, likely to break if Ensembl makes changes to their web interface, and ugly as sin. In contrast to the first solution, however, it should be able to automatically deal with Ensembl updating their database with more genomes.

Entering edit mode

i don't think you should advise that as ensembl monitor screen scraping etc and enforce ip blocking

Entering edit mode

They do? I had no idea. I have to say that it is a very unfortunate combination if their web services don't allow you to do the things that you can do via their web interface and that they at the same time will block you for screen scraping. It might be time to contact them with a feature request.


Login before adding your answer.

Traffic: 1815 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6