Retrieving Probe To Gene Ids For Affymetrix Chips In Bioconductor
1
1
Entering edit mode
11.2 years ago
Brian Tsai ▴ 100

Hi,

I am trying to filter out bad probesets using mas5 present/absent calls. But now I want to map those probes to genes. I know there are updated mappings for example, in the custom CDFs provided by Dai et al (http://brainarray.mbni.med.umich.edu/brainarray/Database/CustomCDF/genomic_curated_CDF.asp) -- how do I pull the probeset2gene mappings out of custom CDFs? I was trying to avoid using i.e. hgu95av2.db, which as far as I understand stores Affymetrix' original probeset2gene mappings.

Thank you

bioconductor affymetrix • 11k views
ADD COMMENT
8
Entering edit mode
11.2 years ago

This kind of code is what I would use for normalizing and mapping Affy U133A CEL files using customCDFs (from Dai et al) to Entrez gene IDs and Symbols. You should be able to adapt for U95Av2.

library(affy)
library(gcrma)
library(hgu133ahsentrezgcdf) #cdfname="HGU133A_HS_ENTREZG"
library(hgu133ahsentrezgprobe)
library(hgu133ahsentrezg.db)

#Set working directory for output
setwd("~/output_dir")

#Set CDF to use
cdf="HGU133A_HS_ENTREZG"

#Read in the raw data from specified dir of CEL files
raw.data.ALL=ReadAffy(verbose=TRUE, celfile.path="/path/to/cel/files", cdfname=cdf)

#perform GCRMA normalization
data.gcrma.norm.ALL=gcrma(raw.data.ALL)

#Get the important stuff out of the data - the expression estimates for each array
gcrma.ALL=exprs(data.gcrma.norm.ALL)

#Remove control probes
gcrma.ALL=gcrma.ALL[1:12065,] #Remove Affy control probes, custom CDF

#Format values to 5 decimal places
gcrma.ALL=format(gcrma.ALL, digits=5)

#Map probes to gene symbols
#To see all mappings for Entrez gene db associated with customCDF
ls("package:hgu133ahsentrezg.db") #customCDF

#Extract probe ids, entrez symbols, and entrez ids
probes.ALL=row.names(gcrma.ALL)
symbol.ALL = unlist(mget(probes.ALL, hgu133ahsentrezgSYMBOL))
ID.ALL = unlist(mget(probes.ALL, hgu133ahsentrezgENTREZID))

#Combine gene annotations with raw data
gcrma.ALL=cbind(probes.ALL,ID.ALL,symbol.ALL,gcrma.ALL)

#Write GCRMA-normalized, mapped data to file
write.table(gcrma.ALL, file = "ALL_gcrma.txt", quote = FALSE, sep = "\t", row.names = FALSE, col.names = TRUE)
ADD COMMENT
0
Entering edit mode

I'm getting error while using for affymetrix chips of hgu133plus2 annotation. when I write normalized mapped data to file I can see symbol names, probe ID's, entrez ID's but not expression values. The expression values column is displaying as <S4 object of class "ExpressionSet">. can you please tell me where is the mistake.

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2457 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6