Question: Converting Affymetrix Probes To Gene Ids
gravatar for Josh
5.0 years ago by
Josh80 wrote:

I downloaded the CGP cell line project expression data and would like to convert the affy probes to official gene symbols. It's the HG U133A v2 platform and the dataset has a total of around 22000 probes. What's the best way to do this? I tried using IDconverter, but it froze after around 100 genes. When I used DAVID to convert to official gene symbol, the results only had about 9800 genes. Using DAVID to convert to entrez returned about 24000 ids, as for some probes, multiple entrez gene ids were returned. How should I deal with these duplicated entrez ids, or is there a better way to do the conversion altogether? Thanks!

affymetrix conversion entrez • 37k views
ADD COMMENTlink modified 14 months ago by Poorya Parvizi30 • written 5.0 years ago by Josh80
  1. You state first that you want official gene symbols (presumably HUGO?), but then talk about Entrez IDs. 2. The brief answer to your question is "BioMart". Please search this site for that term, there are many answers to questions virtually identical to this one.
ADD REPLYlink written 5.0 years ago by Neilfws48k

analogous mapping questions been asked continuously (here and elswhere) over at least a decade because no one (?) ever made a decent 3' UTR probe set that would have had much cleaner gene mappings including paralogue resolution

ADD REPLYlink modified 13 months ago • written 5.0 years ago by cdsouthan1.8k

4 years on so this will never happen?

ADD REPLYlink written 13 months ago by cdsouthan1.8k

I just tried to figure it out today, The code provided by Diwan, is for Rats, it depends which type of Samples you used, Human/Rat/Mouse etc and also it depends on R and Bio conductor versions. I am using R 3.3.2 and Bioconductor 3.4. The following codes works for me, but I am not able to see all Probe IDs ( Keytype = "PROBEID") got results for only few genes.

However, Affymetrix id information is present in Thermofisher database.

## Converting PROBEIDs to Gene name and symbols
## Depends of Organism (Human /Rat/Mice) and depends on R Version and Bioconductor version
library("hgu95av2.db")    ##for Human
select(hgu95av2.db, c("1007_s_at","1053_at"), c("SYMBOL","ENTREZID", "GENENAME")) ##  This is just a trying example
PROBES<- as.character(GSE22483$ID_REF)
OUT <- select(hgu95av2.db,keys= PROBES, columns=c("SYMBOL", "ENTREZID", "GENENAME"),keytype="UNIGENE")
ADD REPLYlink modified 14 months ago by genomax52k • written 14 months ago by angajalaanusha0

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLYlink written 14 months ago by genomax52k
gravatar for Diwan
3.4 years ago by
United States
Diwan510 wrote:

In R, for example if I want to convert  affy ids“1368587_at” and “1385248_a_at” (rat2302 chip) to their gene ids, I will use the following below:

library("rat2302.db")    # here use your chip hgu133a.db

select(rat2302.db, c("1368587_at","1385248_a_at"), c("SYMBOL","ENTREZID", "GENENAME"))

For all probes, create a vector of probes and then use select:

PROBES<- as.character(FCMATRIX$probe)

OUT <- select(rat2302.db, PROBES, c("SYMBOL", "ENTREZID", "GENENAME"))

# Install your chip .db package from bioc






ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Diwan510

For anyone swaying between this and biomaRt - I've worked with biomaRt in the past and though very useful and programmatically accessible, practically the database goes down often and you frequently find yourself waiting around between queries. Downloading a database to select against like this is preferable.

ADD REPLYlink written 3.2 years ago by Louis120

hi Diwan

After I install annotate package and... I run your script but I gave an error

Error in select(rat2302.db, c("1368587_at", "1385248_a_at"), c("SYMBOL",  : 
  unused argument (c("SYMBOL", "ENTREZID", "GENENAME"))

I'm new in using R , please explain for me, what's the problem

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Shamim Sarhadi200
gravatar for Sean Davis
5.0 years ago by
Sean Davis24k
National Institutes of Health, Bethesda, MD
Sean Davis24k wrote:

If you are an R user, consider:

Details on the use can be seen in the AnnotationDbi vignettes.

Alternatively, consider the biomaRt package and see the biomaRt user guide:

ADD COMMENTlink written 5.0 years ago by Sean Davis24k
gravatar for macmath
3.4 years ago by
macmath130 wrote:

Another easy way to annotate Affymetrix Probes to Gene IDS using this link

Upload your Probe list and it will give you all the needful information

Additionally it also helps in cross platform orthologs among probes

ADD COMMENTlink written 3.4 years ago by macmath130
gravatar for jananir1803
18 months ago by
jananir180320 wrote:
eset <- ExpressionSet(assayData=dat)

ID     <- featureNames(eset)

out <- mapIds(hgu133a.db, keys=as.character(ID), c("SYMBOL"), keytype="PROBEID")
ADD COMMENTlink modified 18 months ago • written 18 months ago by jananir180320
gravatar for jananir1803
18 months ago by
jananir180320 wrote:

Different methods of getting GENE information from PROBEID

ADD COMMENTlink written 18 months ago by jananir180320
gravatar for Poorya Parvizi
14 months ago by
Middle East Technical University
Poorya Parvizi30 wrote:

You can use BioMart:

ensembl = useMart(biomart= "ensembl",dataset="hsapiens_gene_ensembl")
affy_ensembl= c("affy_hg_u133_plus_2", "ensembl_gene_id")
getBM(attributes= affy_ensembl, mart= ensembl, values = "*", uniqueRows=T)

The problem in conversion from probe ID to entrez or ensembl gene ID is, one probe ID can represent more than one ensembl gene id and visa versa.

The solution is:

    1. get rid of a probe ID represent more than one ensembl gene ID
    1. Take the mean or max of multiple prob IDs represent one ensembl or entrez ID

Other solution is you can use Brainarray's costum cdfs. (i prefer this one)

download.file("", "/home/hgu133plus2hsensgcdf")
install.packages("/home/hgu133plus2hsensgcdf",repos = NULL)

RawData=ReadAffy(verbose=TRUE, celfile.path=celfilepath, cdfname= "hgu133plus2hsensgcdf", filenames=celfilenames)
ADD COMMENTlink modified 14 months ago • written 14 months ago by Poorya Parvizi30

How would you do this if you had already gotten the normalized gene expression?

ADD REPLYlink written 8 months ago by rf90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 514 users visited in the last hour