Cross-referencing Id from Swiss-Prot/UniProtKB human reference proteome to Ensembl Genes
1
1
Entering edit mode
23 months ago
mhfk2901 ▴ 20

Hi,

I decided to ask this question after looking through this forum for answers. I am trying to get Ensembl gene IDs for the corresponding UniProt entries (human reference proteome) but the UniProt mapping service failed to map against certain entries (meaning there is no Ensembl ID for some of the entries). I tried to map manually using flatfiles available and used several other cross-referencing tool (BioMart) to mixed success. I wish to know if there is an explanation to this. I guess is that UniProt entries with no Ensemble gene ID fail to 100% align to the corresponding Ensembl ID, but I know that is not correct. Appreciate if anyone can provide me an answer for this!

uniprot ensembl mapping • 1.7k views
ADD COMMENT
1
Entering edit mode

Please don't hesitate to contact the UniProt helpdesk with your list of unmapped identifiers, and we can investigate.

ADD REPLY
0
Entering edit mode

That is actually what I should've done first! Thank you for your suggestion, I will keep this thread open for others to chime in their thoughts!

ADD REPLY
1
Entering edit mode

@Elisabeth is with UniProt support based on prior postings.

ADD REPLY
0
Entering edit mode

Thanks, we have received your list at the helpdesk, and the ticket has been assigned to a curator.

ADD REPLY
0
Entering edit mode

@Elisabeth I actually have a follow-up question since looking further at those identifiers, I noticed that they are classified as 'Unplaced' under the Proteome category. Does this means that these proteins unmapped to the current reference genome?

ADD REPLY
1
Entering edit mode
23 months ago

Here is the answer from UniProt:

You cannot find Ensembl gene IDs through our mapping services, for the list of proteins you provided, as there is no corresponding gene and protein in Ensembl. Two possible reasons for that: On one hand, there are still some regions of the human genome that are not properly resolved (see PMID: 35357919) in the current genome reference assembly used by Ensembl (GRCh38.p13). Ensembl is still missing some protein-coding genes. On the other hand, UniProt has curated proteins based on mRNA, proteomics, and literature evidence but the existence of these proteins remains dubious. There might be no corresponding gene at all, and we may probably deprecate these entries in the future.

With Ensembl, we are working on resolving these discrepancies, but you can still find some as shown by your list.

There are other possibilities that can explain the absence of cross-references to Ensembl in human entries, the main ones being the following: Ensembl as a predicted protein which is not identical to the one manually curated in the reviewed (Swiss-Prot) section of UniProtKB. Ensembl has no predicted protein for the corresponding gene as they do not consider it as a protein-coding gene. There is an Ensembl gene/ENSG but no Ensembl peptide/ENSP.

In absence of a mapping to Ensembl and thereby, a mapping to the reference genome, the entries are momentarily moved to the 'unplaced' component of the proteome. When a mapping to Ensembl is added then these entries will be moved to the correct chromosome component in the following releases.

ADD COMMENT

Login before adding your answer.

Traffic: 835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6