Question

Ensembl Id Mapping To Entrez Id

3

Entering edit mode

14.2 years ago

Joseph W Carl Jr ▴ 30

The Situation: I'm predicting miRNA targets using miranda in 3' UTRs. The application that does the prediction uses ENSEMBL ID's. Once I have the predicted targets I want to then identify represented pathways, but this application needs ENTREZ ID's.

The Problem: ENSEMBL ID's associated with 3'UTRs have multiple transcripts associated with them. ENTREZ has multiple ID's associated with the single ENSEMBL ID. What do I use as input to my pathway application. I can't use multiple ENTREZ numbers as that would over-represent the pathway. Can I just pick the first ENTREZ ID I encounter?

Secondary Question Some ENSEMBL ID's have no ENTREZ ID's at all, why is that? I can just eliminate the miRNA but that means I lose some data and it's under-representing my pathway analysis.

How should I handle the selection of ENTREZ ID's for each of the ENSEMBL ID's I start with. What are the implications for the various choices I can make?

ensembl entrez pathway mirna • 10k views

ADD COMMENT • link updated 14.2 years ago by Casey Bergman 18k • written 14.2 years ago by Joseph W Carl Jr ▴ 30

0

Entering edit mode

What pathway analysis tool are you using?

ADD REPLY • link 14.2 years ago by Chris Evelo 10k

0

Entering edit mode

I used Biomart I select a filter to utilize an ID List that uploads a file of ENSEMBL IDs. I also select attributes for external reference choosing the ENTREZ ID.

ADD REPLY • link 14.2 years ago by Joseph W Carl Jr ▴ 30

0

Entering edit mode

Pathway Analysis tool used is

GOstats bioconductor package

ADD REPLY • link 14.2 years ago by Joseph W Carl Jr ▴ 30

0

Entering edit mode

My thought (quick fix) would be to use the NCBI mapping of Ensembl Gene IDs to Entrez Gene IDs.

ADD REPLY • link 14.2 years ago by Travis ★ 2.9k

Ram · Answer 1 · 2011-05-11

I've recently come across this problem too doing direct queries on the Esenmbl mySQL dbs. It appears that for some species there are no Entrez<->Ensembl ID mappings, and for species where mappings exist, they can be between different objects, either translations or transcripts:

For human, Entrez<->Ensembl ID mappings exist at the level of translations only:

mysql> use homo_sapiens_core_62_37g
mysql> select external_db_id, db_name from external_db where db_name like "EntrezGene";
+----------------+------------+
| external_db_id | db_name    |
+----------------+------------+
|           1300 | EntrezGene | 
+----------------+------------+
mysql> select distinct(ensembl_object_type) from object_xref, xref where xref.xref_id=object_xref.xref_id and external_db_id=1300;
+---------------------+
| ensembl_object_type |
+---------------------+
| Translation         | 
+---------------------+

For cow, Entrez<->Ensembl ID mappings exist at the level of transcripts only:

mysql> use bos_taurus_core_62_4k
mysql> select external_db_id, db_name from external_db where db_name like "EntrezGene";
+----------------+------------+
| external_db_id | db_name    |
+----------------+------------+
|           1300 | EntrezGene | 
+----------------+------------+
mysql> select distinct(ensembl_object_type) from object_xref, xref where xref.xref_id=object_xref.xref_id and external_db_id=1300;
+---------------------+
| ensembl_object_type |
+---------------------+
| Transcript          | 
+---------------------+

For orangutang, Entrez<->Ensembl ID mappings not exist for genes in Ensembl:

mysql> use pongo_pygmaeus_core_61_1g
mysql> select external_db_id, db_name from external_db where db_name like "EntrezGene";
+----------------+------------+
| external_db_id | db_name    |
+----------------+------------+
|           1300 | EntrezGene | 
+----------------+------------+
mysql> select distinct(ensembl_object_type) from object_xref, xref where xref.xref_id=object_xref.xref_id and external_db_id=1300;
Empty set (0.64 sec)

In my experience, if the mappings exist in the core Ensembl DBs then they will also be present in the Ensembl BioMart, which is a good place to get Entrez<->Ensembl ID mappings.

See also this BioStar post for other gene ID mapping solutions: Gene Id Conversion Tool

Egon Willighagen · Answer 2 · 2011-05-10

As far as direct mapping from Ensembl ID's to Entrez ID's goes you could use many mapping services. You could look into BridgeDb which out of the box allows you to use ENSEMBL based mapping but it is really a software framework (in Java or as a webservice) that can access many mapping services.

The multiple ENSEMBL transcripts will probably map to multiple Entrez transcripts indeed. But your pathway analysis program will most likely not know about the different transcripts and only have the full gene product. If that is the case it will probably pick up just a single gene, so you will not have an overestimation after all. But you need to check of course. Some pathway analysis tools might not map individual transcripts at all.

I am not sure why some of your ENSEMBL transcripts don't have an Entrez ID at all. How did you find that they don't. Is that ENSEMBL's mapping? I think you would have to check whether these mappings are missing for a good reason or not. They might for instance be ENSEMBL pseudogenes.

We are currently busy evaluating PathVisio's (our own pathway analysis tool) and BridgeD's behaviour in relation to splice variants and miRNA targeting in general. If you would like to try these tools please also register on the email list and tell us your experiences. We might be able to help with some of the problems.

score 2 · Answer 3 · 2011-05-11

How are you pulling the IDs? Is it via the Ensembl API? I have encountered similar problems with Ensembl genes not mapping to HGNC or Entrez identifiers when I would expect them to. There are some genuine errors in there but often it is because Ensembl have categorised the gene as a pseudogene and they say their primary focus is not mapping of pseudogene IDs! In terms of the multiple Entrez IDs - I have seen a lot of them to be 'undesirable' and would not just use them interchangeably. That's why I ask how you are retrieving the IDs - this will help me advise you further. It might be worth retrieving the Ensembl to Entrez ID maps from the NCBI and see how these compare to the ones from Ensembl. This has helped me in the past.