Question: Ensembl Id Mapping To Entrez Id
3
gravatar for Joseph W Carl Jr
8.0 years ago by
University of Maryland
Joseph W Carl Jr30 wrote:

The Situation: I'm predicting miRNA targets using miranda in 3' UTRs. The application that does the prediction uses ENSEMBL ID's. Once I have the predicted targets I want to then identify represented pathways, but this application needs ENTREZ ID's.

The Problem: ENSEMBL ID's associated with 3'UTRs have multiple transcripts associated with them. ENTREZ has multiple ID's associated with the single ENSEMBL ID. What do I use as input to my pathway application. I can't use multiple ENTREZ numbers as that would over-represent the pathway. Can I just pick the first ENTREZ ID I encounter?

Secondary Question Some ENSEMBL ID's have no ENTREZ ID's at all, why is that? I can just eliminate the miRNA but that means I lose some data and it's under-representing my pathway analysis.

How should I handle the selection of ENTREZ ID's for each of the ENSEMBL ID's I start with. What are the implications for the various choices I can make?

pathway ensembl entrez mirna • 6.1k views
ADD COMMENTlink written 8.0 years ago by Joseph W Carl Jr30

What pathway analysis tool are you using?

ADD REPLYlink written 8.0 years ago by Chris Evelo10.0k

I used Biomart I select a filter to utilize an ID List that uploads a file of ENSEMBL IDs. I also select attributes for external reference choosing the ENTREZ ID.

ADD REPLYlink written 8.0 years ago by Joseph W Carl Jr30

Pathway Analysis tool used is

GOstats bioconductor package

ADD REPLYlink written 8.0 years ago by Joseph W Carl Jr30

My thought (quick fix) would be to use the NCBI mapping of Ensembl Gene IDs to Entrez Gene IDs.

ADD REPLYlink written 8.0 years ago by Travis2.8k
3
gravatar for Casey Bergman
8.0 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

I've recently come across this problem too doing direct queries on the Esenmbl mySQL dbs. It appears that for some species there are no Entrez<->Ensembl ID mappings, and for species where mappings exist, they can be between different objects, either translations or transcripts:

For human, Entrez<->Ensembl ID mappings exist at the level of translations only:

mysql> use homo_sapiens_core_62_37g
mysql> select external_db_id, db_name from external_db where db_name like "EntrezGene";
+----------------+------------+
| external_db_id | db_name    |
+----------------+------------+
|           1300 | EntrezGene | 
+----------------+------------+
mysql> select distinct(ensembl_object_type) from object_xref, xref where xref.xref_id=object_xref.xref_id and external_db_id=1300;
+---------------------+
| ensembl_object_type |
+---------------------+
| Translation         | 
+---------------------+

For cow, Entrez<->Ensembl ID mappings exist at the level of transcripts only:

mysql> use bos_taurus_core_62_4k
mysql> select external_db_id, db_name from external_db where db_name like "EntrezGene";
+----------------+------------+
| external_db_id | db_name    |
+----------------+------------+
|           1300 | EntrezGene | 
+----------------+------------+
mysql> select distinct(ensembl_object_type) from object_xref, xref where xref.xref_id=object_xref.xref_id and external_db_id=1300;
+---------------------+
| ensembl_object_type |
+---------------------+
| Transcript          | 
+---------------------+

For orangutang, Entrez<->Ensembl ID mappings not exist for genes in Ensembl:

mysql> use pongo_pygmaeus_core_61_1g
mysql> select external_db_id, db_name from external_db where db_name like "EntrezGene";
+----------------+------------+
| external_db_id | db_name    |
+----------------+------------+
|           1300 | EntrezGene | 
+----------------+------------+
mysql> select distinct(ensembl_object_type) from object_xref, xref where xref.xref_id=object_xref.xref_id and external_db_id=1300;
Empty set (0.64 sec)

In my experience, if the mappings exist in the core Ensembl DBs then they will also be present in the Ensembl BioMart, which is a good place to get Entrez<->Ensembl ID mappings.

See also this BioStar post for other gene ID mapping solutions: http://biostar.stackexchange.com/questions/22/gene-id-conversion-tool

ADD COMMENTlink written 8.0 years ago by Casey Bergman18k
2
gravatar for Chris Evelo
8.0 years ago by
Chris Evelo10.0k
Maastricht, The Netherlands
Chris Evelo10.0k wrote:

As far as direct mapping from Ensembl ID's to Entrez ID's goes you could use many mapping services. You could look into BridgeDb which out of the box allows you to use ENSEMBL based mapping but it is really a software framework (in Java or as a webservice) that can access many mapping services.

The multiple ENSEMBL transcripts will probably map to multiple Entrez transcripts indeed. But your pathway analysis program will most likely not know about the different transcripts and only have the full gene product. If that is the case it will probably pick up just a single gene, so you will not have an overestimation after all. But you need to check of course. Some pathway analysis tools might not map individual transcripts at all.

I am not sure why some of your ENSEMBL transcripts don't have an Entrez ID at all. How did you find that they don't. Is that ENSEMBL's mapping? I think you would have to check whether these mappings are missing for a good reason or not. They might for instance be ENSEMBL pseudogenes.

We are currently busy evaluating PathVisio's (our own pathway analysis tool) and BridgeD's behaviour in relation to splice variants and miRNA targeting in general. If you would like to try these tools please also register on the email list and tell us your experiences. We might be able to help with some of the problems.

ADD COMMENTlink modified 5.4 years ago by Egon Willighagen5.2k • written 8.0 years ago by Chris Evelo10.0k
2
gravatar for Travis
8.0 years ago by
Travis2.8k
USA
Travis2.8k wrote:

How are you pulling the IDs? Is it via the Ensembl API? I have encountered similar problems with Ensembl genes not mapping to HGNC or Entrez identifiers when I would expect them to. There are some genuine errors in there but often it is because Ensembl have categorised the gene as a pseudogene and they say their primary focus is not mapping of pseudogene IDs! In terms of the multiple Entrez IDs - I have seen a lot of them to be 'undesirable' and would not just use them interchangeably. That's why I ask how you are retrieving the IDs - this will help me advise you further. It might be worth retrieving the Ensembl to Entrez ID maps from the NCBI and see how these compare to the ones from Ensembl. This has helped me in the past.

ADD COMMENTlink written 8.0 years ago by Travis2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1276 users visited in the last hour