Sam Vs Lmfit(Limma)
1
1
Entering edit mode
11.4 years ago

Dear BioStar Users,

I still don't understand how limma works. When I read the manual, it seems cristal clear, but again and again, with my data, i get lost.

I have a microarray dataset (on agilent platform) from a cancer study. 76 Clinical samples have been hybridized, and are distributed in 3 groups : A = Healthy, B = pre-cancerous, C = Cancer

 table(gr)
gr
 A  B  C 
17 34 25

I usually use SAM algorithm, because i understand what it does : It takes one gene after the other, compare the expression in two or n groups, then correct for multitesting (if I may summarize it like that).

With SAM approach, I get :

sam.gr = sam(eset,cl=gr,B=1000,rand=123,gene.names=names$SYMBOL)
printsam.gr,seq(0.5,6,0.5))
SAM Analysis for the Multi-Class Case with 3 Classes 

   Delta    p0     False Called      FDR
1    0.5 0.204 17079.988  28092 0.124061
2    1.0 0.204  9259.003  24936 0.075765
3    1.5 0.204  5101.050  22387 0.046494
4    2.0 0.204  2826.544  20110 0.028680
5    2.5 0.204  1591.944  18236 0.017813
6    3.0 0.204   905.447  16618 0.011118
7    3.5 0.204   521.075  15252 0.006971
8    4.0 0.204   303.668  14119 0.004389
9    4.5 0.204   179.219  13156 0.002780
10   5.0 0.204   107.174  12314 0.001776
11   5.5 0.204    64.225  11519 0.001138
12   6.0 0.204    38.668  10778 0.000732

So I'll take a delta = 3.5 :

ssam.gr = summarysam.gr,3.5)

Then I compute the fold changes that I want to look at :

FC.CvsA = apply(exprs(eset)[ ,which(gr == "C")],1,median,na.rm=T) - 
  apply(exprs(eset)[ ,which(gr == "A")],1,median,na.rm=T)

FC.BvsA = apply(exprs(eset)[ ,which(gr == "B")],1,median,na.rm=T) - 
  apply(exprs(eset)[ ,which(gr == "A")],1,median,na.rm=T)

FC.CvsB = apply(exprs(eset)[ ,which(gr == "C")],1,median,na.rm=T) - 
  apply(exprs(eset)[ ,which(gr == "B")],1,median,na.rm=T)

Then I mix both to get a list of differentially expressed genes :

i = ssam.gr@row.sig.genes
ind = ssam.gr@row.sig.genes[ which( abs(FC.CvsA[i]) > 1 | abs(FC.BvsA[i]) > 1 | abs(FC.CvsB[i]) > 1 ) ]
length(ind)
[1] 5877

Great, about 6000 genes differentially expressed.

If I try to do the same with limma, i would wrote (the mistake is probably here, but where and why ?) :

design = model.matrix(~gr)
colnames(design)
[1] "(Intercept)" "grB"         "grC"        

fit = lmFit(eset,design)

fit2 = eBayes(fit)

res = decideTests(fit2,p.value=0.001,lfc=log2(2))

ind = which( apply(res,1,function(x) {length(which(x != 0))>0}) == T)
length(ind)
[1] 31845

You're thinking I did not read the manual carefully, so I also tried :

design = model.matrix(~0+gr)
colnames(design) = c("A","B","C")
fit = lmFit(eset,design)
cont.matrix = makeContrasts(
  CvsA = C-A,
  BvsA = B-A,
  CvsB = C-B,
  levels=design)
fit2 = contrasts.fit(fit,cont.matrix)
fit2 = eBayes(fit2)

res = decideTests(fit2,p.value=0.001,lfc=log2(2))

ind = which( apply(res,1,function(x) {length(which(x != 0))>0}) == T)
length(ind)
[1] 5110

Great !! BUT WHY ? I'm sure it is obvious, but i don't understand the difference between the two approach [~gr] and [~0+gr and contrasts]

Thanks for your time,

Julien

sam limma microarray • 5.5k views
ADD COMMENT
1
Entering edit mode
11.4 years ago
brentp 24k

In your first limma example above, you are comparing B and C to the baseline A. But I believe that the decideTests is also looking at the Intercept (which is why you see so many genes).

By using ~0 + gr, and then the contrasts, you're forcing the comparison among the contrasts that you specify so that is what you want.

ADD COMMENT

Login before adding your answer.

Traffic: 1800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6