Tutorial: sharing some naive codes for microarray normalization in R with whom are too new in R alike me
gravatar for A
4.9 years ago by
A3.9k wrote:


for sure this is not perfect...hope it helps

I am working with Arabidopsis thaliana

for RMA normalization


# To read all CEL files in the working directory:
Data <- ReadAffy() eset <- rma(Data)
norm.data <- exprs(eset) # The norm.data R object contains the normalized expression for every probeset in the ATH1 microarrays used in this example. In order to convert the probeset IDs to Arabidopsis gene identifiers, the file ftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2010-12-20.txt download from the TAIR database and place in the folder with the microarray data. In order to avoid ambiguous probeset associations (i.e. probesets that have multiple matches to genes), we only used probes that match only one gene in the Arabidopsis genome.
affy_names <- read.delim("affy_ATH1_array_elements-2010-12-20.txt",header=T) # Select the columns that contain the probeset ID and corresponding AGI number. Please note that the positions used to index the matrix depend on the input format of the array elements file. You can change these numbers to index the corresponding columns if you are using a different format:
probe_agi <- as.matrix(affy_names[,c(1,5)]) # To associate the probeset with the corresponding AGI locus:
normalized.names <-merge(probe_agi,norm.data,by.x=1,by.y=0)[,-1] # To remove probesets that do not match the Arabidopsis genome:
normalized.arabidopsis <- normalized.names[grep("AT",normalized.names[,1]),] # To remove ambiguous probes:
normalized.arabidopsis.unambiguous <- normalized.arabidopsis[grep(pattern="",normalized.arabidopsis[,1], invert=T),] # In some cases, multiple probes match the same gene, due to updates in the annotation of the genome. To remove duplicated genes in the matrix:
normalized.agi.final <- normalized.arabidopsis.unambiguous[!duplicated(normalized.arabidopsis.unambiguous[,1]),] # To assign the AGI number as row name:
rownames(normalized.agi.final) <- normalized.agi.final[,1]
normalized.agi.final <- normalized.agi.final[,-1] #The resulting gene expression dataset contains unique row identifies (i.e. AGI locus), and different expression values obtained from different experiments on each column # To export this data matrix from R to a tab-delimited file use the following command. The file will be written to the folder that you set up as your working directory in R using the setwd() command in line 1 above:
write.table (normalized.agi.final,"RMA.txt", sep="\t",col.names=NA,quote=F)

for VSN and gcrma normalization except this part the rest is the same

vsn normalization

library (affy)

library (vsn)


eset  <- expresso(Data, normalize.method="vsn", bg.correct=F, pmcorrect.method="pmonly", summary.method="medianpolish")

norm.data  <-  exprs(eset)

for gcrma

library (affy)

library (gcrma)

eset <- gcrma(Data)<br>
norm.data <- exprs(eset)

for `Illumina HumanHT-12 V3.0 expression beadchip`




 # Set GEO dataset

 G <- getGEO("GSE3053",GSEMatrix=T) 

 #Get the ExpressionSet object

 eset <- G[[1]]

 #Normalization: may not be necessary as GEO datasets should be pre-processed. 


 #eset.n <- normalize.ExpressionSet.quantiles(eset)

 # See the following to check if the dataset appears to be normalized.

 e <- exprs(eset)


And finally a link for Agilent microarray normalization


tutorial R • 3.5k views
ADD COMMENTlink modified 3.8 years ago • written 4.9 years ago by A3.9k

Also useful for beginners like me. Thanks for sharing.

ADD REPLYlink written 4.5 years ago by JackieMe30

Hello, thank you for sharing ! But you should post this as tutorial, not question, and you could format your code like this to make it easier to read :

library (affy)
ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Carlo Yague5.5k

thank you friends, actually when remembering i passed hardship time when i was going to learn normalization :)

ADD REPLYlink written 4.5 years ago by A3.9k

@F: I reformatted your post to make it more readable. Take a look and confirm things look ok. Post type was also changed to tutorial to correctly reflect the content.

ADD REPLYlink written 3.8 years ago by GenoMax95k

thank you so much, really looks much more readable. today I was googling for Illumina HumanHT-12 V3.0 expression beadchip data normalization, I thought share what I learned for students beginner in R.

ADD REPLYlink written 3.8 years ago by A3.9k

I did some further modifications, some assignment arrows were converted to &lt;- and quotation marks were changed too.

ADD REPLYlink written 3.8 years ago by WouterDeCoster45k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 994 users visited in the last hour