Question: Microarray analysis in limma
gravatar for Leite
2.5 years ago by
São Paulo - Brazil - Unifesp
Leite980 wrote:

Hello everyone!

I'm trying to reanalyze data from this test GSE48080 but using the same lfc = 1.7, I find not even close to what was found in the original paper, I simply find 10 or 0 almost always and in original paper 300+ genes.

I have already tried several methods and the data found are the same, I have tried to change the shape of my target and still find DEGS always below this lfc = 1.7, what am I doing wrong? Is there something I'm forgetting to do?

Please I really need reanalyze this data.

Article method

Labeling of probes with cyanine 3 dye (Cy3), hybridization and washing procedures followed, strictly, the manufacturer's protocol (One Color Quick Amp Labeling Kit, Agilent Technologies). The arrays were scanned using a GenePix 4000B Microarray Scanner (Molecular Devices) using default parameters for Agilent 44K microarrays. Initial data analysis was performed using the Agilent Feature Extraction software (version 9.5). This software places the microarray grids, determines feature intensities and flags outlier pixels. The quality of the microarray data was assessed using the positive controls and RNA spike-ins. The gProcessedSignal (i.e. end result of standard Agilent Feature Extraction normalization and background correction procedures) from each array was loaded into the Partek Genomics Suite (v6.6), normalized between arrays using quantile normalization, and log transformed for further analyses. For subsequent statistical analysis we used the ANOVA implementation of Partek. The ANOVA model was defined by the experimental design and included variations due to volunteer group (sepsis, control), day of sample collection (D0, D7) and survival status (survivor, non-survivor).

Code I'm using:



targets <- readTargets("targets.txt")

num.arrays = length(targets$FileName)

# Only reads in the G values, fake the R values as G as well
maData = read.maimages(targets,names=targets$Cy3, columns = list(G = "gMeanSignal", Gb = "gBGUsed", R = 
"gProcessedSignal", Rb = "gBGMedianSignal"), annotation= c("Row", "Col", "FeatureNum", "ProbeUID", "ControlType", 
"ProbeName", "GeneName", "SystematicName"))

# Do not background subtract
maData.nobg = backgroundCorrect(maData, method="none")

#G values only. Use Quantile Normalization for gProcessedSignal
MAq = normalizeBetweenArrays(maData.nobg, method="Rquantile")

TS <- paste(targets$Patient, targets$Day, sep=".")
TS <- factor(TS, levels=c("SepsisAlive.D0","SepsisAlive.D7","SepsisDead.D0","SepsisDead.D7","Control.Control"))
design <- model.matrix(~0+TS)
colnames(design) <- levels(TS)
fit <- lmFit(MAq, design)

cont.matrix <- makeContrasts(SepsisDead.D0-SepsisDead.D7, levels=design)
fit2 <-, cont.matrix)
fit2 <- eBayes(fit2)

nrow(topTable(fit2, coef=1, number=99999, lfc=1.7))
probeset.list <- topTable(fit2, coef=1, number=99999, lfc=1.7)
write.table(probeset.list, "results.txt", sep="\t", quote=FALSE)

Best regards,


limma R • 2.8k views
ADD COMMENTlink written 2.5 years ago by Leite980

Well, you're analyzing it on a completely different platform than them, so I wouldn't expect the exact same parameters to work. limma is not the Partek Suite, and their methods for finding DEGs are likely not exactly the same.

What I'd recommend is removing your lfc restriction and seeing what the lfc looks like for their DEGs in your analysis. This will allow you to determine if you need to tweak the lfc for your reanalysis if those genes must absolutely be included.

ADD REPLYlink written 2.5 years ago by jared.andrews075.5k

Dear by jared.andrews07,

I agree with you that two different platforms will generate different results, however it is a very big difference in 0 ~ 10 genes that I am finding for 300 found in the paper.

I'll think about what you said.



ADD REPLYlink written 2.5 years ago by Leite980

Hello comrades!

The statistical methodologies that you and the others employed are very different. The original authors used ANOVA, whilst you fitted a least squares linear regression line (via limma) to the data and derived statistics from this. ANOVA has different assumptions than regression. Thus, the results are likely going to be very different.

I disagree with the choice of ANOVA by the original authors. I also would not recommended trying to build a complex model with limma due to the low sample numbers.


ADD REPLYlink written 2.5 years ago by Kevin Blighe60k

Hey Kevin,

Thank you so much!


ADD REPLYlink written 2.5 years ago by Leite980
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1595 users visited in the last hour