Question: Most efficient strategy to convert from Ensembl protein IDs (ENSP) to Entrez Gene Symbols?
gravatar for sam237337
2.0 years ago by
sam23733770 wrote:

I have a list of Ensembl protein IDs (ENSP) that I need to convert to Entrez-formatted gene symbols. So far, I haven't identified a straightforward method to convert between these two formats, as I'm not seeing a platform that will permit this. This is my current tentative strategy:

Step 1: Convert ENSP protein IDs to HGNC gene symbols via the R package EnsDb.Hsapiens.v86

Step 2: Convert HGNC gene symbols to UniProtKB format via the UniProt Protein Conversion tool ( ). For some reason, UniProtKB is the only format that is available when converting from HGNC format.

Step 3: Convert UniProt KB protein IDs to Entrez Gene ID Numbers via the UniProt Protein Conversion tool ( ); this platform offers conversion to Entrez gene ID numbers, but not Entrez gene symbols...

Step 4: Convert Entrez Gene ID Numbers to Entrez Gene Symbols via the R package, with reference to this thread: Gene symbol convert to Entrez ID

Strategies reviewed:

I reviewed the biomaRt platform, but am not seeing relevant ID conversion tools (e.g. going to --> Tools --> ID Conversion takes me to a general notice that the community portal is unavailable).

Referencing this related thread: Make List Of All Human Gene Ids (Ens, Hgnc, Entrez) To Ease Conversion Of Ids

...The International Protein Index (IPI) platform provides an ipi.HUMAN.xrefs file:

...with the initial content:

Protein cross-references file for IPI human release 3.87

SP A0A183 IPI00807623 ENSP00000411070; VALIDATED:NP_001122072; HIT000394684; ABJ55982; 31824,LCE6A; 448835,LCE6A; UPI0000D83229 Hs.62927; CCDS44227.1; GI:190610047; OTTHUMP00000210240;

However, the columns in this file aren't labeled, and I don't know the format of each column, or whether Entrez-format gene symbols are present. The ReadMe file does not provide this information.

With reference to this thread: Gene Id Conversion Tool

...I tried to use DAVID: ... but it doesn't seem to recognize ENSP-formatted inputs, as testing examples generates error messages.

bioDBnet ( ) doesn't permit ENSP conversion to Entrez format.

The Hyperlink Management System ( ) has tools related to Ensemble Protein IDs, but I don't see tools for converting to Entrez format.

If there is any way to simplify the intended 4-step processing strategy described above, I will appreciate any suggestions. Thanks in advance for your input.

conversion • 1.3k views
ADD COMMENTlink modified 2.0 years ago by vkkodali2.2k • written 2.0 years ago by sam23733770

Why not just download the mapping from Biomart? That'd be a single step and vastly simpler.

ADD REPLYlink written 2.0 years ago by Devon Ryan97k
gravatar for Emily_Ensembl
2.0 years ago by
Emily_Ensembl21k wrote:

It's very easy using BioMart. You just filter by the list of ENSPs and get the NCBI gene IDs as output (note, they used to be called Entrez Gene IDs, they're now called NCBI gene IDs).

ADD COMMENTlink written 2.0 years ago by Emily_Ensembl21k

Thanks for your clarification, Emily; I will plan to try using BioMart again, beyond my initial experience with that platform that hadn't been successful.

I think that it is frowned upon in forum etiquette to reply to each individual response with a thank-you message, so I will let this response serve as my thank-you to all who responded; I will be testing these various strategies in the near future, and will post an update once I determine what works.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by sam23733770
gravatar for Santosh Anand
2.0 years ago by
Santosh Anand5.2k
Santosh Anand5.2k wrote:

Your best bet is to convert ENSP -> Entrez ID using StringDB ( You may dload all the associations from here

From EntrezID, it is simple to convert to GeneName or any other ID.

ADD COMMENTlink written 2.0 years ago by Santosh Anand5.2k
gravatar for vkkodali
2.0 years ago by
United States
vkkodali2.2k wrote:

You can use the file gene2ensembl.gz from this NCBI FTP path. You want 9606 in the first column (tax_id for human) and the second column is the Entrez GeneID with the ENSP accession (where applicable) in the last column.

ADD COMMENTlink written 2.0 years ago by vkkodali2.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1965 users visited in the last hour