Question: Most efficient strategy to convert from Ensembl protein IDs (ENSP) to Entrez Gene Symbols?
gravatar for sam237337
28 days ago by
sam23733750 wrote:

I have a list of Ensembl protein IDs (ENSP) that I need to convert to Entrez-formatted gene symbols. So far, I haven't identified a straightforward method to convert between these two formats, as I'm not seeing a platform that will permit this. This is my current tentative strategy:

Step 1: Convert ENSP protein IDs to HGNC gene symbols via the R package EnsDb.Hsapiens.v86

Step 2: Convert HGNC gene symbols to UniProtKB format via the UniProt Protein Conversion tool ( ). For some reason, UniProtKB is the only format that is available when converting from HGNC format.

Step 3: Convert UniProt KB protein IDs to Entrez Gene ID Numbers via the UniProt Protein Conversion tool ( ); this platform offers conversion to Entrez gene ID numbers, but not Entrez gene symbols...

Step 4: Convert Entrez Gene ID Numbers to Entrez Gene Symbols via the R package, with reference to this thread: Gene symbol convert to Entrez ID

Strategies reviewed:

I reviewed the biomaRt platform, but am not seeing relevant ID conversion tools (e.g. going to --> Tools --> ID Conversion takes me to a general notice that the community portal is unavailable).

Referencing this related thread: Make List Of All Human Gene Ids (Ens, Hgnc, Entrez) To Ease Conversion Of Ids

...The International Protein Index (IPI) platform provides an ipi.HUMAN.xrefs file:

...with the initial content:

Protein cross-references file for IPI human release 3.87

SP A0A183 IPI00807623 ENSP00000411070; VALIDATED:NP_001122072; HIT000394684; ABJ55982; 31824,LCE6A; 448835,LCE6A; UPI0000D83229 Hs.62927; CCDS44227.1; GI:190610047; OTTHUMP00000210240;

However, the columns in this file aren't labeled, and I don't know the format of each column, or whether Entrez-format gene symbols are present. The ReadMe file does not provide this information.

With reference to this thread: Gene Id Conversion Tool

...I tried to use DAVID: ... but it doesn't seem to recognize ENSP-formatted inputs, as testing examples generates error messages.

bioDBnet ( ) doesn't permit ENSP conversion to Entrez format.

The Hyperlink Management System ( ) has tools related to Ensemble Protein IDs, but I don't see tools for converting to Entrez format.

If there is any way to simplify the intended 4-step processing strategy described above, I will appreciate any suggestions. Thanks in advance for your input.

conversion • 181 views
ADD COMMENTlink modified 27 days ago by vkkodali690 • written 28 days ago by sam23733750

Why not just download the mapping from Biomart? That'd be a single step and vastly simpler.

ADD REPLYlink written 28 days ago by Devon Ryan86k
gravatar for Emily_Ensembl
27 days ago by
Emily_Ensembl16k wrote:

It's very easy using BioMart. You just filter by the list of ENSPs and get the NCBI gene IDs as output (note, they used to be called Entrez Gene IDs, they're now called NCBI gene IDs).

ADD COMMENTlink written 27 days ago by Emily_Ensembl16k

Thanks for your clarification, Emily; I will plan to try using BioMart again, beyond my initial experience with that platform that hadn't been successful.

I think that it is frowned upon in forum etiquette to reply to each individual response with a thank-you message, so I will let this response serve as my thank-you to all who responded; I will be testing these various strategies in the near future, and will post an update once I determine what works.

ADD REPLYlink modified 27 days ago • written 27 days ago by sam23733750
gravatar for Santosh Anand
28 days ago by
Santosh Anand4.3k
Santosh Anand4.3k wrote:

Your best bet is to convert ENSP -> Entrez ID using StringDB ( You may dload all the associations from here

From EntrezID, it is simple to convert to GeneName or any other ID.

ADD COMMENTlink written 28 days ago by Santosh Anand4.3k
gravatar for vkkodali
27 days ago by
United States
vkkodali690 wrote:

You can use the file gene2ensembl.gz from this NCBI FTP path. You want 9606 in the first column (tax_id for human) and the second column is the Entrez GeneID with the ENSP accession (where applicable) in the last column.

ADD COMMENTlink written 27 days ago by vkkodali690
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1052 users visited in the last hour