Unable to get all Uniprot IDs corresponding to Ensembl ID
1
0
Entering edit mode
2.2 years ago
Apprentice ▴ 160

I want to convert Ensembl gene IDs to Uniprot IDs. I have used the biomaRt package in R and perform the following commands.

> ensembl <- useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl")

> getBM(mart=ensembl,attributes=c("ensembl_gene_id","uniprot_gn_id"),filters="ensembl_gene_id",values="ENSG00000183955",uniqueRows=TRUE);

As a result, the following was output.

> id uniprot_gn_id

> 1 ENSG00000183955        C9JKQ0

> 2 ENSG00000183955        F8WC45

However, I think that this output is insufficient. Because the gene corresponding to Ensembl ID="ENSG00000183955" is KMT5A. Based on the following Uniprot site, I think that not only C9JKQ0 and F8WC45 but also A0A0C4DFR3, and Q9NQR1 should also be output. https://www.uniprot.org/uniprot/Q9NQR1

How can I get all Uniprot IDs in biomaRt? I would appreciate it if you could tell me.

Uniprot biomaRt • 1.2k views
ADD COMMENT
3
Entering edit mode

You can also use UniProt's ID mapping tool here: https://www.uniprot.org/uploadlists/

Full data available in files here if you want to grep the information directly: https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/

ADD REPLY
2
Entering edit mode
2.2 years ago
manaswwm ▴ 490

In your case, via biomaRt, you are querying for the information available on the following page (if I am not mistaken): http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000183955;r=12:123384132-123409353. In order to get all the Uniprot IDs mentioned on this page, I would change the attributes that I am querying for to something like the following:

getBM(mart=ensembl,attributes=c("ensembl_gene_id","uniprotsptrembl", "uniprotswissprot"),filters="ensembl_gene_id",values="ENSG00000183955",uniqueRows=TRUE)

Here, uniprotsptrembl IDs are the "unreviewed" proteins and uniprotswissprot are the "reviewed" protein IDs. Using this I get:

  ensembl_gene_id uniprotsptrembl uniprotswissprot
1 ENSG00000183955                           Q9NQR1
2 ENSG00000183955          F8WC45                 
3 ENSG00000183955          C9JKQ0  

For ID A0A0C4DFR3, as you can see this is not listed on the gene page of Ensembl, but this is listed on Uniprot https://www.uniprot.org/uniprot/?query=ENSG00000183955&sort=score. I imagine a workaround here would be to use the REST-API feature of Uniprot directly and not biomaRt - https://www.ebi.ac.uk/proteins/api/doc/#!/proteins/getByCrossReference check this for more information

ADD COMMENT
0
Entering edit mode

Thank you for your advice. As I answered above, I could get the information I wanted with the use of Uniprot ID mapping tool, but I will also try to use the REST-API feature of Uniprot that you told me.

ADD REPLY

Login before adding your answer.

Traffic: 1976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6