Question

Sam Vs Lmfit(Limma)

1

Entering edit mode

11.4 years ago

Julien Textoris ▴ 430

Dear BioStar Users,

I still don't understand how limma works. When I read the manual, it seems cristal clear, but again and again, with my data, i get lost.

I have a microarray dataset (on agilent platform) from a cancer study. 76 Clinical samples have been hybridized, and are distributed in 3 groups : A = Healthy, B = pre-cancerous, C = Cancer

 table(gr)
gr
 A  B  C 
17 34 25

I usually use SAM algorithm, because i understand what it does : It takes one gene after the other, compare the expression in two or n groups, then correct for multitesting (if I may summarize it like that).

With SAM approach, I get :

sam.gr = sam(eset,cl=gr,B=1000,rand=123,gene.names=names$SYMBOL)
printsam.gr,seq(0.5,6,0.5))
SAM Analysis for the Multi-Class Case with 3 Classes 

   Delta    p0     False Called      FDR
1    0.5 0.204 17079.988  28092 0.124061
2    1.0 0.204  9259.003  24936 0.075765
3    1.5 0.204  5101.050  22387 0.046494
4    2.0 0.204  2826.544  20110 0.028680
5    2.5 0.204  1591.944  18236 0.017813
6    3.0 0.204   905.447  16618 0.011118
7    3.5 0.204   521.075  15252 0.006971
8    4.0 0.204   303.668  14119 0.004389
9    4.5 0.204   179.219  13156 0.002780
10   5.0 0.204   107.174  12314 0.001776
11   5.5 0.204    64.225  11519 0.001138
12   6.0 0.204    38.668  10778 0.000732

So I'll take a delta = 3.5 :

ssam.gr = summarysam.gr,3.5)

Then I compute the fold changes that I want to look at :

FC.CvsA = apply(exprs(eset)[ ,which(gr == "C")],1,median,na.rm=T) - 
  apply(exprs(eset)[ ,which(gr == "A")],1,median,na.rm=T)

FC.BvsA = apply(exprs(eset)[ ,which(gr == "B")],1,median,na.rm=T) - 
  apply(exprs(eset)[ ,which(gr == "A")],1,median,na.rm=T)

FC.CvsB = apply(exprs(eset)[ ,which(gr == "C")],1,median,na.rm=T) - 
  apply(exprs(eset)[ ,which(gr == "B")],1,median,na.rm=T)

Then I mix both to get a list of differentially expressed genes :

i = ssam.gr@row.sig.genes
ind = ssam.gr@row.sig.genes[ which( abs(FC.CvsA[i]) > 1 | abs(FC.BvsA[i]) > 1 | abs(FC.CvsB[i]) > 1 ) ]
length(ind)
[1] 5877

Great, about 6000 genes differentially expressed.

If I try to do the same with limma, i would wrote (the mistake is probably here, but where and why ?) :

design = model.matrix(~gr)
colnames(design)
[1] "(Intercept)" "grB"         "grC"        

fit = lmFit(eset,design)

fit2 = eBayes(fit)

res = decideTests(fit2,p.value=0.001,lfc=log2(2))

ind = which( apply(res,1,function(x) {length(which(x != 0))>0}) == T)
length(ind)
[1] 31845

You're thinking I did not read the manual carefully, so I also tried :

design = model.matrix(~0+gr)
colnames(design) = c("A","B","C")
fit = lmFit(eset,design)
cont.matrix = makeContrasts(
  CvsA = C-A,
  BvsA = B-A,
  CvsB = C-B,
  levels=design)
fit2 = contrasts.fit(fit,cont.matrix)
fit2 = eBayes(fit2)

res = decideTests(fit2,p.value=0.001,lfc=log2(2))

ind = which( apply(res,1,function(x) {length(which(x != 0))>0}) == T)
length(ind)
[1] 5110

Great !! BUT WHY ? I'm sure it is obvious, but i don't understand the difference between the two approach [~gr] and [~0+gr and contrasts]

Thanks for your time,

Julien

sam limma microarray • 5.5k views

ADD COMMENT • link updated 10.0 years ago by Biostar 20 • written 11.4 years ago by Julien Textoris ▴ 430

score 1 · Answer 1 · 2012-12-21

In your first limma example above, you are comparing B and C to the baseline A. But I believe that the decideTests is also looking at the Intercept (which is why you see so many genes).

By using ~0 + gr, and then the contrasts, you're forcing the comparison among the contrasts that you specify so that is what you want.