Question: Limma Contrast Question
1
pichai.raman20 wrote:

Hello All,

I'm using the Limma package to perform some microarray analysis. In the past I've typically compared 2 groups in contrasts such as "GroupA-GroupB". Occasionally, I've even done comparisons where I've normalized before comparing 2 groups such as "(GroupAExp-GroupAControl)-(GroupBExp-GroupBControl)". In my current analysis I have 4 levels for a particular factor and I'd like to compare 2 of the levels to 2 of the other levels. So, my targets file is as follows.

 Sample Condition S1 A S2 A S3 A S4 B S5 B S6 B S7 C S8 C S9 C S10 D S11 D S12 D

And basically my contrast is "(A+B)-(C+D)". Which is to say I'd like to find genes that are significantly higher(or lower) in the pooled A and B group compared to the C and D group. Unfortunately, I'm not quite sure I understand the log fold changes that are output. I don't quite understand how Limma is calculating these. I could always just create another column/factor in which I put A & B together as one factor and C & D as another factor but in truth these 4 levels are distinct. So my questions to the group are

1) Is it better to make the contrast as I have or would it be more appropriate to create a new factor and pool A+b into 1 level and C+D into another level.

2) If the contrast I've chosen is correct how does the logFC get calculated, I know that its simply not taking the all the values in A+B and subtracting all the values in C+D as that doesn't yield the same result as what I see in the output.

Any thoughts would be much appreciated.

Cheers,

Pichai

limma contrasts • 2.9k views
modified 5.3 years ago • written 5.3 years ago by pichai.raman20
1

Thanks Brent, yeah I actually think I have no choice but to create a new column. The contrast is essentially taking the mean of the first level adding it to the mean of the second level and then subtracting the mean of the 3rd and 4th levels which is inflating the LogFC of course. I assumed somewhat naively that it would just mean all values in A and B together and subtract the mean of all values in C and D.

`library(limma);`

```#Create Data s1 <- abs(rnorm(10))*5; # Group A s2 <- abs(rnorm(10))*5; # Group A s3 <- abs(rnorm(10))*4; # Group B s4 <- abs(rnorm(10))*4; # Group B s5 <- abs(rnorm(10))*3; # Group C s6 <- abs(rnorm(10))*3; # Group C s7 <- abs(rnorm(10))*2; # Group D s8 <- abs(rnorm(10))*2; # Group D```

```dataExp <- data.frame(s1, s2, s3, s4, s5, s6, s7, s8); rownames(dataExp) <- paste("g", c(1:10), sep=""); colnames(dataExp) <- paste("s", c(1:8), sep="");```

```#Targets File fTarget <- factor(c(rep("A",2), rep("B",2), rep("C",2), rep("D",2))); design <- model.matrix(~0+fTarget); colnames(design) <- gsub("fTarget", "", colnames(design));```

```#Run Limma fit <- lmFit(dataExp, design); myConts <- "(A+B)-(C+D)"; contrast.matrix <- makeContrasts(contrasts=paste(myConts), levels=design) fit2 <- contrasts.fit(fit, contrast.matrix) fit2 <- eBayes(fit2) topTable(fit2)```

```    ID      logFC   AveExpr          t     P.Value  adj.P.Val         B 1   g1  6.4588390 4.1901582  2.8160003 0.009826156 0.09826156 -2.771584 3   g3  3.7026979 2.6308296  1.4732837 0.154284956 0.53257352 -4.466917 4   g4  4.0849422 3.8189196  1.4530554 0.159772057 0.53257352 -4.486924 5   g5 -2.7718791 1.9597527 -1.1644998 0.256207255 0.55965665 -4.747696 2   g2  2.9266876 2.9289445  1.1068964 0.279828327 0.55965665 -4.793931 9   g9  1.9081786 3.6318367  0.7606780 0.454609288 0.63881922 -5.027231 8   g8  1.7709799 1.6842968  0.7325484 0.471264278 0.63881922 -5.042671 7   g7 -1.5356162 1.1441038 -0.6676071 0.511055378 0.63881922 -5.076215 10 g10 -0.6502641 1.7959415 -0.2567874 0.799635371 0.82904464 -5.218137 6   g6 -0.4985768 0.9833397 -0.2184091 0.829044644 0.82904464 -5.225028```

`> #Calculate logFC manually`

```> calcFC <- data.frame(rowMeans(dataExp[1:4])-rowMeans(dataExp[5:8])); > calcFC     rowMeans.dataExp.1.4.....rowMeans.dataExp.5.8.. g1                                        3.2294195 g2                                        1.4633438 g3                                        1.8513490 g4                                        2.0424711 g5                                       -1.3859396 g6                                       -0.2492884 g7                                       -0.7678081 g8                                        0.8854899 g9                                        0.9540893 g10                                      -0.3251320 >  > #These do not match up, i.e. logFC doesn't equate with the calculated logFC becuase the contast isn't simply grouping them together  >  > #But when we calculate in this way (which obviously matches the expression( > calcFC2 <- data.frame(rowMeans(dataExp[1:2])+ rowMeans(dataExp[3:4])-rowMeans(dataExp[5:6])-rowMeans(dataExp[7:8])); > calcFC2     rowMeans.dataExp.1.2.....rowMeans.dataExp.3.4.....rowMeans.dataExp.5.6..... g1                                                                    6.4588390 g2                                                                    2.9266876 g3                                                                    3.7026979 g4                                                                    4.0849422 g5                                                                   -2.7718791 g6                                                                   -0.4985768 g7                                                                   -1.5356162 g8                                                                    1.7709799 g9                                                                    1.9081786 g10                                                                  -0.6502641```

re 1) for simplicity, you could make a new column as you suggest:

df\$condAB = df\$Condition %in% c("A", "B")

2) you can find lots of information on how limma calculates logFC, e.g. :https://stat.ethz.ch/pipermail/bioconductor/2012-January/042950.html

If you want A and B to be pooled then you won't be doing a contrast, pretty much by definition.

All I wanted to do was contrast samples that are members of groups A or B with samples that are members of groups C and D. This equates to creating a new factor with 2 levels e.g. E and F , E) all samples in A and B, F) all samples in C and D. so our contrast may be "E-F"

What I did was "(A+B)-(C+D)" which I believe is still a contrast. You're contrasting the sum of A and B with that of C and D which I'm sure has some use case somewhere (just not for me right now). Is there another term for this expression, if its not a contrast?

1

I think your confusion arises from thinking that the contrast (here `(A+B)-(C+D)`) applies to the raw values rather than to the fit model. By the time you perform the contrast, you've already fit the 4 factor model and are using the coefficients and their deviations. If you pool things, you're not doing a contrast and you're also decreasing your power since you're not compensating for a known effect (thus, the variance is increased). Perhaps this ends up making sense for the underlying biological question, perhaps not.