Converting gene symbols to protein (uniprot) ids gives multiple matches per gene symbol. Why?
0
0
Entering edit mode
3 days ago
peter.berry5 ▴ 20

I used the bitr function from the clusterProfiler package to convert gene symbols from a DE experiment to UniProt protein ids. For some unique gene symbols, there are multiple UniProt ids.

Surely each gene id should map to a single protein and each protein has a unique id. So is my code correct and does it matter that there are multiple UniProt ids for a single gene?

My code is

Genes <- c("AACS", "ACAA2", "ACADM", "ACLY", "ACOT8")
Protein_IDs <- bitr(Genes, fromType="SYMBOL", toType="UNIPROT", OrgDb="org.Hs.eg.db") # returns 15 rows
test <- distinct(Protein_IDs, UNIPROT, .keep_all = TRUE) # returns 15 rows

SYMBOL UNIPROT
AACS    Q86V21
AACS    A0A024RBV2
ACAA2   B3KNP8
ACAA2   P42765
ACADM   A0A0S2Z366
ACADM   P11310
ACADM   B7Z9I1
ACADM   Q5HYG7
ACADM   Q5T4U5
ACADM   B4DJE7
ACLY    A0A024R1T9
ACLY    P53396
ACLY    Q4LE36
ACLY    A0A024R1Y2
ACOT8   O14734
Uniprot r • 137 views
ADD COMMENT
1
Entering edit mode

This seems to use a really lenient mapping with unreviewed entries etc. You may have better luck using biomaRt.

ADD REPLY
0
Entering edit mode

HUGO entry for ACADM indeed lists only one UniProt accession.

You can download an official list of human gene symbols and their corresponding UniProt ID's from HUGO site using a custom download. Select things you want in output.

ADD REPLY

Login before adding your answer.

Traffic: 2487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6