Question: Conversion of Ensembl Gene Ids to Uniprot Ids
gravatar for deddu
6 months ago by
deddu0 wrote:

Hi, I'm a computer scientist with no knowledge of biology. I'm working with protein-protein interaction and didn't understand how to make a conversion.

Usually, I use the UniProt Id conversion page to convert Ensembl Gene Ids into UniProt Entries or Ids. However, several mappings fall into the case: 1 Ensembl id I get multiple rows/UniProt entries id. My question is: how can I choose safely a unique entry? Should I filter by something like reviewed or unreviewed entries?

The same happens when I try to convert gene names to UniProt ids. I understand that a gene can make more than one protein but, for the best of my knowledge, that case should be rare, no?

Thanks in advance, I'm struggling with that.

software error gene • 544 views
ADD COMMENTlink modified 6 months ago by Kevin Blighe65k • written 6 months ago by deddu0

I understand that a gene can make more than one protein but, for the best of my knowledge, that case should be rare,

No that is not the case. Here is a list of alternate spicing databases. Here is one random human genen example. Each of those transcripts can potentially form a different protein isoform.

ADD REPLYlink modified 6 months ago • written 6 months ago by genomax90k
gravatar for Kevin Blighe
6 months ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

To answer the specific question and build upon Genomax's comment, you can map these via biomaRt, in R:


mart <- useMart('ENSEMBL_MART_ENSEMBL')
mart <- useDataset('hsapiens_gene_ensembl', mart)

annotLookup <- getBM(
  mart = mart,
  attributes = c(

head(subset(annotLookup, uniprot_gn_id != ''), 20)[,-4]
   ensembl_gene_id   gene_biotype external_gene_name uniprot_gn_id
6  ENSG00000198888 protein_coding             MT-ND1        P03886
7  ENSG00000198888 protein_coding             MT-ND1        U5Z754
11 ENSG00000198763 protein_coding             MT-ND2        P03891
12 ENSG00000198763 protein_coding             MT-ND2        Q7GXY9
18 ENSG00000198804 protein_coding             MT-CO1        P00395
19 ENSG00000198804 protein_coding             MT-CO1        U5YWV7
22 ENSG00000198712 protein_coding             MT-CO2        P00403
23 ENSG00000198712 protein_coding             MT-CO2        U5Z487
25 ENSG00000228253 protein_coding            MT-ATP8        P03928
26 ENSG00000228253 protein_coding            MT-ATP8        U5YV54
27 ENSG00000198899 protein_coding            MT-ATP6        P00846
28 ENSG00000198899 protein_coding            MT-ATP6        Q0ZFE3
29 ENSG00000198938 protein_coding             MT-CO3        P00414
30 ENSG00000198938 protein_coding             MT-CO3        Q7GIM7
32 ENSG00000198840 protein_coding             MT-ND3        P03897
33 ENSG00000198840 protein_coding             MT-ND3        Q7GXZ5
35 ENSG00000212907 protein_coding            MT-ND4L        P03901
36 ENSG00000212907 protein_coding            MT-ND4L        Q7GXZ4
37 ENSG00000198886 protein_coding             MT-ND4        P03905
38 ENSG00000198886 protein_coding             MT-ND4        H9EC08

^^ here, annotLookup contains a mapping of all genes. So, you can use it as a 'master' lookup table.

You can also look up specific Ensembl gene IDs, like this:

  mart = mart,
  attributes = c(
  filter = 'ensembl_gene_id',
  values = c('ENSG00000132768','ENSG00000118507',
  uniqueRows = TRUE)


ADD COMMENTlink written 6 months ago by Kevin Blighe65k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1257 users visited in the last hour