Question

How to process RMA data of gene microarry ?

0

Entering edit mode

4.9 years ago

Laura_zz • 0

I get RMA data from ArrayExpress, and it's single-channel data. I don't know much about gene chips, so I checked some information and tried to run the script provided by the limma package myself, but I don't know if it’s right.

One step of RMA data is to run exprs(RMA data), but I can't run it, so I use log2(RMA data+1) to replace function exprs(), is that correct?
Should RMA data processing be processed first with log2 RMA data+1, and then with lmfit in the limma package for differential expression calculation?

RNA-Seq R • 2.3k views

ADD COMMENT • link updated 4.9 years ago by Kevin Blighe 87k • written 4.9 years ago by Laura_zz • 0

1

Entering edit mode

Your data is likely already normalised and log2-transformed. Can you please show all data processing commands that you have used?

exprs() has nothing to do with the RMA process itself. exprs() is just a function that accesses a 'slot' (variable) in an ExpressionSet R object - this 'slot' contains the expression data.

RMA normalisation involves (in this order)

background correction
quantile normalisation
log2 transformation

ADD REPLY • link 4.9 years ago by Kevin Blighe 87k

0

Entering edit mode

A<- read.csv(file="RMAdata.csv",header=TRUE,as.is=T)

A2<-A[,2:22]

A2<-as.matrix(A2)

rownames(A2)<-A[,1]

A3=log2(A2+1)

kk<- read.csv(file="design.csv",header=TRUE,as.is=T)

kk1<-kk[,2:11]

rownames(kk1)<-kk[,1]

mode(kk)<-"numeric"

fit<-lmFit(A3,kk1)

cont.matrix <-makeContrasts(LeafD0.5h-LeafD0h,levels=kk1)

fit2<-contrasts.fit(fit, cont.matrix)

fit2<-eBayes(fit2)

LeafD0.5hVSLeafD0h<-topTable(fit2, adjust="BH",n=30000)

write.csv(LeafD0.5hVSLeafD0h,file="LeafD0.5hVSLeafD0h.csv")

In this commands, I use log2() to replace exprs(). RMA data was derived from ArrayExpress, so if I use this data directly, The fold change in differential expression will be thousands to tens of thousands of times, so I'm confused if the data can be used of differential expression analysis directly.

The protocol description of this experiment showed that The raw data (.probe file) was subjected to RMA (Robust Multi-Array Analysis; Irizarry et al. Biostatistics 4(2):249), quantile normalization (Bolstad et al. Bioinformatics 19(2):185), and background correction as implemented in the NimbleScan software package, version 2.4.27 (Roche NimbleGen, Inc.).

Experiment description link: https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-38130/?keywords=&organism=Oryza+sativa&exptype%5B%5D=%22rna+assay%22&exptype%5B%5D=&array=&page=1&pagesize=500&tdsourcetag=s_pctim_aiomsg

ADD REPLY • link 4.9 years ago by Laura_zz • 0

0

Entering edit mode

In this commands, I use log2() to replace exprs().

Hey, I am not sure what you mean by this ^

log2() and exprs() perform different things - one cannot replace the other. Here is what the description in the manual page of esprs() says:

Description:
     These generic functions access the expression and error
     measurements of assay data stored in an object derived from the
     ‘eSet-class’.

It would be easier to use GEO2R to obtain this data - the EBI has no great automated way to obtain published expression datasets - NCBI's GEO does.

Go to main accession page (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE38130)
Click on the blue button ANALYZE WITH GEO2R
Click on the R script tab

There, you will find code to automatically obtain the data:

################################################################
#   Boxplot for selected GEO samples
library(Biobase)
library(GEOquery)

# load series and platform data from GEO

gset <- getGEO("GSE38130", GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL15594", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

It seems that, for this project, the data is not log transformed, so, you need to log-transform it like this:

exprs(gset) <- log2(exprs(gset))

boxplot(exprs(gset), boxwex=0.7, notch=T, main=title, outline=FALSE, las=2)

hist(exprs(gset))

Techniclly speaking, the description given by the authors of their data processing steps is incorrect. RMA normalisation IS a background correction, quantile normalisation, and log [base 2] transformation. So, technically, they have not performed RMA (they only performed 2 steps of it).

ADD REPLY • link 4.9 years ago by Kevin Blighe 87k