Gene names to Protein (cell surface markers) mapping
2
0
Entering edit mode
2.5 years ago
firestar ★ 1.6k

I am essentially looking for gene to protein mapping/lookup table. I have gene names (in biomart, it's called external_gene_name to be specific). Looks something like this:

"PLPPR3"   "TSPY10"   "ELANE"    "DENND11"  "TSPY4"   "CCL15"    "PSMB3"    "TAS2R46"  "ZBTB9"    "KIR2DL5A" "RWDD2B"

I want to know the proteins produced by these genes. The protein IDs should look like this (cell surface markers for example):

"CD45RA" "CD27"   "CD16"   "GPR56"  "CD56"   "CD57"  "CD94"   "CD158" 

I am not sure what is the official terminology for these IDs. uniprot? swissprot? something else? Does anyone know where to find mappings for these gene names to protein ids? If they are on biomart, perhaps someone knows the name of the field? Thanks!

Update: This is the R code that I use to fetch the data:

library(biomaRt)
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset(mart=mart,dataset="hsapiens_gene_ensembl")
pdata <- getBM(mart=mart,attributes=c("external_gene_name","protein_id"),filters=c("biotype"),values=list("protein_coding"),useCache=FALSE)

   external_gene_name protein_id
1                               
2                       ABK41909
3                       BAF82881
4                       BAG36999
5                       EAX09448
6            TMPRSS15   AAC50138
7            TMPRSS15   CAB65555
8            TMPRSS15   CAB90389
9            TMPRSS15   CAB90392
10           TMPRSS15   AAI11750
11           TMPRSS15           
12            SMIM34B           
13             GATD3B           
14             GATD3B   BAA20888
...

The attribute protein_id returns some strange looking protein IDs. It should be replaced with something else. Not sure what to use there.

cell transcriptomics surface marker proteomics • 1.4k views
ADD COMMENT
1
Entering edit mode
2.4 years ago
firestar ★ 1.6k

UniProt SwissProt Human.

   Gene names                    Entry name        
 1 CD63 MLA1 TSPAN30             CD63_HUMAN 
 2 CDV3 H41                      CDV3_HUMAN 
 3 CD79A IGA MB1                 CD79A_HUMAN
 4 CD8B CD8B1                    CD8B_HUMAN 
 5 CDKL2                         CDKL2_HUMAN
 6 CDRT4                         CDRT4_HUMAN
 7 CDX2 CDX3                     CDX2_HUMAN 
 8 CDK11A CDC2L2 CDC2L3 PITSLREB CD11A_HUMAN
 9 CD14                          CD14_HUMAN 
10 CD22 SIGLEC2                  CD22_HUMAN 
# … with 167 more rows
ADD COMMENT
1
Entering edit mode
2.5 years ago
Emily 23k

These may be listed under Gene Synonym.

ADD COMMENT
0
Entering edit mode

Hmmm.. Gene Synonym (external_synonym) looks like gene IDs to me. I am looking for protein IDs.

ADD REPLY
0
Entering edit mode

Mostly protein and gene names are the same. I have found some of the examples on your list under gene synonyms. The gene synonyms are basically any names that the gene/protein have been ever known as in the literature, they are not names from any specific database. The CD names you have are terms people use in the literature, they are not any official name used in any database, which makes them gene synonyms.

ADD REPLY
0
Entering edit mode

Wow! This is shocking! So the most commonly used "protein IDs" (cell surface markers, CDXXX) are just in people's minds and not official! If there is no database, I am not sure how to resolve two of my issues:

  • All gene names are not same as protein names. Some are completely different. Few examples:
```
marker   gene
Cd45     PTPRC
Cd11b    ITGAM
Cd115    CSF1R
```

Update: As you said, they are old synonyms. I also found them on UniProt. For example:

enter image description here

  • There must be a one-to-many relationship. One gene produces many proteins, so a gene name must match to multiple protein names. Or are all proteins from a gene given the same name?

Update: The closest I got to is to download the table from UniProt SwissProt Human.

   Gene names                    Entry name        
 1 CD63 MLA1 TSPAN30             CD63_HUMAN 
 2 CDV3 H41                      CDV3_HUMAN 
 3 CD79A IGA MB1                 CD79A_HUMAN
 4 CD8B CD8B1                    CD8B_HUMAN 
 5 CDKL2                         CDKL2_HUMAN
 6 CDRT4                         CDRT4_HUMAN
 7 CDX2 CDX3                     CDX2_HUMAN 
 8 CDK11A CDC2L2 CDC2L3 PITSLREB CD11A_HUMAN
 9 CD14                          CD14_HUMAN 
10 CD22 SIGLEC2                  CD22_HUMAN 
# … with 167 more rows
ADD REPLY

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6