Question

Converting Affymetrix Probes To Gene Ids

14

Entering edit mode

11.1 years ago

Josh ▴ 140

I downloaded the CGP cell line project expression data and would like to convert the affy probes to official gene symbols. It's the HG U133A v2 platform and the dataset has a total of around 22000 probes. What's the best way to do this? I tried using IDconverter, but it froze after around 100 genes. When I used DAVID to convert to official gene symbol, the results only had about 9800 genes. Using DAVID to convert to entrez returned about 24000 ids, as for some probes, multiple entrez gene ids were returned. How should I deal with these duplicated entrez ids, or is there a better way to do the conversion altogether? Thanks!

affymetrix conversion entrez • 71k views

ADD COMMENT • link updated 21 days ago by F • 0 • written 11.1 years ago by Josh ▴ 140

3

Entering edit mode

I just tried to figure it out today, The code provided by Diwan, is for Rats, it depends which type of Samples you used, Human/Rat/Mouse etc and also it depends on R and Bio conductor versions. I am using R 3.3.2 and Bioconductor 3.4. The following codes works for me, but I am not able to see all Probe IDs ( Keytype = "PROBEID") got results for only few genes.

However, Affymetrix id information is present in Thermofisher database. https://www.thermofisher.com/us/en/home/life-science/microarray-analysis/microarray-data-analysis/genechip-array-annotation-files.html

## Converting PROBEIDs to Gene name and symbols
## Depends of Organism (Human /Rat/Mice) and depends on R Version and Bioconductor version
source("http://bioconductor.org/biocLite.R")
biocLite("hgu95av2.db")
library("AnnotationDbi")
library("hgu95av2.db")    ##for Human
select(hgu95av2.db, c("1007_s_at","1053_at"), c("SYMBOL","ENTREZID", "GENENAME")) ##  This is just a trying example
PROBES<- as.character(GSE22483$ID_REF)
OUT <- select(hgu95av2.db,keys= PROBES, columns=c("SYMBOL", "ENTREZID", "GENENAME"),keytype="UNIGENE")
keytypes(hgu95av2.db)

ADD REPLY • link updated 7.2 years ago by GenoMax 144k • written 7.2 years ago by angajalaanusha ▴ 30

0

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY • link 7.2 years ago by GenoMax 144k

0

Entering edit mode

You said that ''it depends on R version and Bioconductor version. Could you please explain what changes with the version and how to control/check that to decide on the method and to get reliable & repeatable results?

ADD REPLY • link 21 days ago by F • 0

0

Entering edit mode

You state first that you want official gene symbols (presumably HUGO?), but then talk about Entrez IDs.
The brief answer to your question is "BioMart". Please search this site for that term, there are many answers to questions virtually identical to this one.

ADD REPLY • link updated 2.3 years ago by Ram 44k • written 11.1 years ago by Neilfws 49k

1

Entering edit mode

Analogous mapping questions been asked continuously (here and elsewhere) over at least a decade because no one (?) ever made a decent 3' UTR probe set that would have had much cleaner gene mappings including paralogue resolution

ADD REPLY • link updated 2.3 years ago by Ram 44k • written 11.1 years ago by cdsouthan ★ 1.9k

2

Entering edit mode

4 years on so this will never happen?

ADD REPLY • link 7.1 years ago by cdsouthan ★ 1.9k

Ram · Answer 1 · 2015-02-13

11

Entering edit mode

9.5 years ago

Diwan ▴ 650

In R, for example if I want to convert affy ids 1368587_at and 1385248_a_at (rat2302 chip) to their gene ids, I will use the following below:

library("annotate")
library("rat2302.db")    # here use your chip hgu133a.db

select(rat2302.db, c("1368587_at","1385248_a_at"), c("SYMBOL","ENTREZID", "GENENAME"))

For all probes, create a vector of probes and then use select:

PROBES<- as.character(FCMATRIX$probe)
OUT <- select(rat2302.db, PROBES, c("SYMBOL", "ENTREZID", "GENENAME"))

# Install your chip .db package from bioc
source("http://bioconductor.org/biocLite.R")

biocLite("hgu133a.db")

HTH

Diwan

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 9.5 years ago by Diwan ▴ 650

1

Entering edit mode

For anyone swaying between this and biomaRt - I've worked with biomaRt in the past and though very useful and programmatically accessible, practically the database goes down often and you frequently find yourself waiting around between queries. Downloading a database to select against like this is preferable.

ADD REPLY • link 9.3 years ago by Louis ▴ 160

0

Entering edit mode

Hi Diwan

After I install annotate package and... I run your script but I gave an error

Error in select(rat2302.db, c("1368587_at", "1385248_a_at"), c("SYMBOL",  :
  unused argument (c("SYMBOL", "ENTREZID", "GENENAME"))

I'm new in using R, please explain for me, what's the problem

ADD REPLY • link updated 22 months ago by Ram 44k • written 8.9 years ago by Shamim Sarhadi ▴ 220

Ram · Answer 2 · 2013-07-07

3

Entering edit mode

11.1 years ago

Sean Davis 27k

If you are an R user, consider: http://www.bioconductor.org/packages/release/data/annotation/html/hgu133a2.db.html

Details on the use can be seen in the AnnotationDbi vignettes.

Alternatively, consider the biomaRt package and see the biomaRt user guide

ADD COMMENT • link updated 22 months ago by Ram 44k • written 11.1 years ago by Sean Davis 27k

Ram · Answer 3 · 2017-05-18

You can use BioMart:

library("biomaRt")
ensembl = useMart(biomart= "ensembl",dataset="hsapiens_gene_ensembl")
affy_ensembl= c("affy_hg_u133_plus_2", "ensembl_gene_id")
getBM(attributes= affy_ensembl, mart= ensembl, values = "*", uniqueRows=T)

The problem in conversion from probe ID to entrez or ensembl gene ID is, one probe ID can represent more than one ensembl gene id and visa versa.

The solution is:

Get rid of a probe ID represent more than one ensembl gene ID
Take the mean or max of multiple prob IDs represent one ensembl or entrez ID

Other solution is you can use Brainarray's custom cdfs. (I prefer this one)

download.file("http://mbni.org/customcdf/21.0.0/ensg.download/hgu133plus2hsensgcdf_21.0.0.tar.gz", "/home/hgu133plus2hsensgcdf")
install.packages("/home/hgu133plus2hsensgcdf",repos = NULL)
library(hgu133plus2hsensgcdf)

library(affy)
RawData=ReadAffy(verbose=TRUE, celfile.path=celfilepath, cdfname= "hgu133plus2hsensgcdf", filenames=celfilenames)

score 1 · Answer 4 · 2015-02-13

1

Entering edit mode

9.5 years ago

macmath ▴ 170

Another easy way to annotate Affymetrix Probes to Gene IDS using this link

Upload your Probe list and it will give you all the needful information

Additionally it also helps in cross platform orthologs among probes

ADD COMMENT • link 9.5 years ago by macmath ▴ 170

score 1 · Answer 5 · 2017-01-10

1

Entering edit mode

7.5 years ago

jananir1803 ▴ 20

eset <- ExpressionSet(assayData=dat)

ID     <- featureNames(eset)

out <- mapIds(hgu133a.db, keys=as.character(ID), c("SYMBOL"), keytype="PROBEID")

ADD COMMENT • link 7.5 years ago by jananir1803 ▴ 20

score 1 · Answer 6 · 2017-01-10

1

Entering edit mode

7.5 years ago

jananir1803 ▴ 20

Different methods of getting GENE information from PROBEID

ADD COMMENT • link 7.5 years ago by jananir1803 ▴ 20