**20**wrote:

Hello All,

I'm using the Limma package to perform some microarray analysis. In the past I've typically compared 2 groups in contrasts such as "GroupA-GroupB". Occasionally, I've even done comparisons where I've normalized before comparing 2 groups such as "(GroupAExp-GroupAControl)-(GroupBExp-GroupBControl)". In my current analysis I have 4 levels for a particular factor and I'd like to compare 2 of the levels to 2 of the other levels. So, my targets file is as follows.

Sample |
Condition |

S1 |
A |

S2 |
A |

S3 |
A |

S4 |
B |

S5 |
B |

S6 |
B |

S7 |
C |

S8 |
C |

S9 |
C |

S10 |
D |

S11 |
D |

S12 |
D |

And basically my contrast is "(A+B)-(C+D)". Which is to say I'd like to find genes that are significantly higher(or lower) in the pooled A and B group compared to the C and D group. Unfortunately, I'm not quite sure I understand the log fold changes that are output. I don't quite understand how Limma is calculating these. I could always just create another column/factor in which I put A & B together as one factor and C & D as another factor but in truth these 4 levels are distinct. So my questions to the group are

1) Is it better to make the contrast as I have or would it be more appropriate to create a new factor and pool A+b into 1 level and C+D into another level.

2) If the contrast I've chosen is correct how does the logFC get calculated, I know that its simply not taking the all the values in A+B and subtracting all the values in C+D as that doesn't yield the same result as what I see in the output.

Any thoughts would be much appreciated.

Cheers,

Pichai

Thanks Brent, yeah I actually think I have no choice but to create a new column. The contrast is essentially taking the mean of the first level adding it to the mean of the second level and then subtracting the mean of the 3rd and 4th levels which is inflating the LogFC of course. I assumed somewhat naively that it would just mean all values in A and B together and subtract the mean of all values in C and D.

`library(limma);`

`#Create Data`

s1 <- abs(rnorm(10))*5; # Group A

s2 <- abs(rnorm(10))*5; # Group A

s3 <- abs(rnorm(10))*4; # Group B

s4 <- abs(rnorm(10))*4; # Group B

s5 <- abs(rnorm(10))*3; # Group C

s6 <- abs(rnorm(10))*3; # Group C

s7 <- abs(rnorm(10))*2; # Group D

s8 <- abs(rnorm(10))*2; # Group D

`dataExp <- data.frame(s1, s2, s3, s4, s5, s6, s7, s8);`

rownames(dataExp) <- paste("g", c(1:10), sep="");

colnames(dataExp) <- paste("s", c(1:8), sep="");

`#Targets File`

fTarget <- factor(c(rep("A",2), rep("B",2), rep("C",2), rep("D",2)));

design <- model.matrix(~0+fTarget);

colnames(design) <- gsub("fTarget", "", colnames(design));

`#Run Limma`

fit <- lmFit(dataExp, design);

myConts <- "(A+B)-(C+D)";

contrast.matrix <- makeContrasts(contrasts=paste(myConts), levels=design)

fit2 <- contrasts.fit(fit, contrast.matrix)

fit2 <- eBayes(fit2)

topTable(fit2)

`ID logFC AveExpr t P.Value adj.P.Val B`

1 g1 6.4588390 4.1901582 2.8160003 0.009826156 0.09826156 -2.771584

3 g3 3.7026979 2.6308296 1.4732837 0.154284956 0.53257352 -4.466917

4 g4 4.0849422 3.8189196 1.4530554 0.159772057 0.53257352 -4.486924

5 g5 -2.7718791 1.9597527 -1.1644998 0.256207255 0.55965665 -4.747696

2 g2 2.9266876 2.9289445 1.1068964 0.279828327 0.55965665 -4.793931

9 g9 1.9081786 3.6318367 0.7606780 0.454609288 0.63881922 -5.027231

8 g8 1.7709799 1.6842968 0.7325484 0.471264278 0.63881922 -5.042671

7 g7 -1.5356162 1.1441038 -0.6676071 0.511055378 0.63881922 -5.076215

10 g10 -0.6502641 1.7959415 -0.2567874 0.799635371 0.82904464 -5.218137

6 g6 -0.4985768 0.9833397 -0.2184091 0.829044644 0.82904464 -5.225028

`> #Calculate logFC manually`

`> calcFC <- data.frame(rowMeans(dataExp[1:4])-rowMeans(dataExp[5:8]));`

> calcFC

rowMeans.dataExp.1.4.....rowMeans.dataExp.5.8..

g1 3.2294195

g2 1.4633438

g3 1.8513490

g4 2.0424711

g5 -1.3859396

g6 -0.2492884

g7 -0.7678081

g8 0.8854899

g9 0.9540893

g10 -0.3251320

>

> #These do not match up, i.e. logFC doesn't equate with the calculated logFC becuase the contast isn't simply grouping them together

>

> #But when we calculate in this way (which obviously matches the expression(

> calcFC2 <- data.frame(rowMeans(dataExp[1:2])+ rowMeans(dataExp[3:4])-rowMeans(dataExp[5:6])-rowMeans(dataExp[7:8]));

> calcFC2

rowMeans.dataExp.1.2.....rowMeans.dataExp.3.4.....rowMeans.dataExp.5.6.....

g1 6.4588390

g2 2.9266876

g3 3.7026979

g4 4.0849422

g5 -2.7718791

g6 -0.4985768

g7 -1.5356162

g8 1.7709799

g9 1.9081786

g10 -0.6502641

20re 1) for simplicity, you could make a new column as you suggest:

df$condAB = df$Condition %in% c("A", "B")

2) you can find lots of information on how limma calculates logFC, e.g. :https://stat.ethz.ch/pipermail/bioconductor/2012-January/042950.html

22kIf you want A and B to be pooled then you won't be doing a contrast, pretty much by definition.

88kAll I wanted to do was contrast samples that are members of groups A or B with samples that are members of groups C and D. This equates to creating a new factor with 2 levels e.g. E and F , E) all samples in A and B, F) all samples in C and D. so our contrast may be "E-F"

What I did was "(A+B)-(C+D)" which I believe is still a contrast. You're contrasting the sum of A and B with that of C and D which I'm sure has some use case somewhere (just not for me right now). Is there another term for this expression, if its not a contrast?

20I think your confusion arises from thinking that the contrast (here

`(A+B)-(C+D)`

) applies to the raw values rather than to the fit model. By the time you perform the contrast, you've already fit the 4 factor model and are using the coefficients and their deviations. If you pool things, you're not doing a contrast and you're also decreasing your power since you're not compensating for a known effect (thus, the variance is increased). Perhaps this ends up making sense for the underlying biological question, perhaps not.88k