Question: Converting Affymetrix Probes To Gene Ids
gravatar for Josh
7.7 years ago by
Josh100 wrote:

I downloaded the CGP cell line project expression data and would like to convert the affy probes to official gene symbols. It's the HG U133A v2 platform and the dataset has a total of around 22000 probes. What's the best way to do this? I tried using IDconverter, but it froze after around 100 genes. When I used DAVID to convert to official gene symbol, the results only had about 9800 genes. Using DAVID to convert to entrez returned about 24000 ids, as for some probes, multiple entrez gene ids were returned. How should I deal with these duplicated entrez ids, or is there a better way to do the conversion altogether? Thanks!

affymetrix conversion entrez • 55k views
ADD COMMENTlink modified 3.8 years ago by Poorya Parvizi60 • written 7.7 years ago by Josh100

I just tried to figure it out today, The code provided by Diwan, is for Rats, it depends which type of Samples you used, Human/Rat/Mouse etc and also it depends on R and Bio conductor versions. I am using R 3.3.2 and Bioconductor 3.4. The following codes works for me, but I am not able to see all Probe IDs ( Keytype = "PROBEID") got results for only few genes.

However, Affymetrix id information is present in Thermofisher database.

## Converting PROBEIDs to Gene name and symbols
## Depends of Organism (Human /Rat/Mice) and depends on R Version and Bioconductor version
library("hgu95av2.db")    ##for Human
select(hgu95av2.db, c("1007_s_at","1053_at"), c("SYMBOL","ENTREZID", "GENENAME")) ##  This is just a trying example
PROBES<- as.character(GSE22483$ID_REF)
OUT <- select(hgu95av2.db,keys= PROBES, columns=c("SYMBOL", "ENTREZID", "GENENAME"),keytype="UNIGENE")
ADD REPLYlink modified 3.8 years ago by GenoMax96k • written 3.8 years ago by angajalaanusha10

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLYlink written 3.8 years ago by GenoMax96k
  1. You state first that you want official gene symbols (presumably HUGO?), but then talk about Entrez IDs. 2. The brief answer to your question is "BioMart". Please search this site for that term, there are many answers to questions virtually identical to this one.
ADD REPLYlink written 7.7 years ago by Neilfws49k

analogous mapping questions been asked continuously (here and elswhere) over at least a decade because no one (?) ever made a decent 3' UTR probe set that would have had much cleaner gene mappings including paralogue resolution

ADD REPLYlink modified 3.7 years ago • written 7.7 years ago by cdsouthan1.8k

4 years on so this will never happen?

ADD REPLYlink written 3.7 years ago by cdsouthan1.8k
gravatar for Diwan
6.1 years ago by
United States
Diwan610 wrote:

In R, for example if I want to convert  affy ids“1368587_at” and “1385248_a_at” (rat2302 chip) to their gene ids, I will use the following below:

library("rat2302.db")    # here use your chip hgu133a.db

select(rat2302.db, c("1368587_at","1385248_a_at"), c("SYMBOL","ENTREZID", "GENENAME"))

For all probes, create a vector of probes and then use select:

PROBES<- as.character(FCMATRIX$probe)

OUT <- select(rat2302.db, PROBES, c("SYMBOL", "ENTREZID", "GENENAME"))

# Install your chip .db package from bioc






ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by Diwan610

For anyone swaying between this and biomaRt - I've worked with biomaRt in the past and though very useful and programmatically accessible, practically the database goes down often and you frequently find yourself waiting around between queries. Downloading a database to select against like this is preferable.

ADD REPLYlink written 5.9 years ago by Louis140

hi Diwan

After I install annotate package and... I run your script but I gave an error

Error in select(rat2302.db, c("1368587_at", "1385248_a_at"), c("SYMBOL",  : 
  unused argument (c("SYMBOL", "ENTREZID", "GENENAME"))

I'm new in using R , please explain for me, what's the problem

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Shamim Sarhadi220
gravatar for Sean Davis
7.7 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

If you are an R user, consider:

Details on the use can be seen in the AnnotationDbi vignettes.

Alternatively, consider the biomaRt package and see the biomaRt user guide:

ADD COMMENTlink written 7.7 years ago by Sean Davis26k
gravatar for Poorya Parvizi
3.8 years ago by
The University of Edinburgh
Poorya Parvizi60 wrote:

You can use BioMart:

ensembl = useMart(biomart= "ensembl",dataset="hsapiens_gene_ensembl")
affy_ensembl= c("affy_hg_u133_plus_2", "ensembl_gene_id")
getBM(attributes= affy_ensembl, mart= ensembl, values = "*", uniqueRows=T)

The problem in conversion from probe ID to entrez or ensembl gene ID is, one probe ID can represent more than one ensembl gene id and visa versa.

The solution is:

    1. get rid of a probe ID represent more than one ensembl gene ID
    1. Take the mean or max of multiple prob IDs represent one ensembl or entrez ID

Other solution is you can use Brainarray's costum cdfs. (i prefer this one)

download.file("", "/home/hgu133plus2hsensgcdf")
install.packages("/home/hgu133plus2hsensgcdf",repos = NULL)

RawData=ReadAffy(verbose=TRUE, celfile.path=celfilepath, cdfname= "hgu133plus2hsensgcdf", filenames=celfilenames)
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Poorya Parvizi60

How would you do this if you had already gotten the normalized gene expression?

ADD REPLYlink written 3.3 years ago by fr150

Take a look: A: How do I convert Affymetrix ID names to gene names

ADD REPLYlink written 2.3 years ago by Kevin Blighe71k
gravatar for macmath
6.1 years ago by
macmath140 wrote:

Another easy way to annotate Affymetrix Probes to Gene IDS using this link

Upload your Probe list and it will give you all the needful information

Additionally it also helps in cross platform orthologs among probes

ADD COMMENTlink written 6.1 years ago by macmath140
gravatar for jananir1803
4.2 years ago by
jananir180320 wrote:
eset <- ExpressionSet(assayData=dat)

ID     <- featureNames(eset)

out <- mapIds(hgu133a.db, keys=as.character(ID), c("SYMBOL"), keytype="PROBEID")
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by jananir180320
gravatar for jananir1803
4.2 years ago by
jananir180320 wrote:

Different methods of getting GENE information from PROBEID

ADD COMMENTlink written 4.2 years ago by jananir180320
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2082 users visited in the last hour