Hello Biostars, I am analyzing 450k data in limma and I cannot figure out how to set up my matrix to get the most out of it. I am interested in a control versus disease state, which is simple enough. I also have multiple cell types from a single person, which is also simple. My question is how to set up a matrix when comparing control and disease states when I have multiple cell types from each person. I know that I could just break it up and compare each cell type in control and disease state, but is there a better way to do this? To be more specific, I have cells a1 a2 a3 a4 from individual 1 and a1 a2 a3 a4 from individual 2, individual 1 is healthy and individual2 is in a disease state. I have 3 and 2 individuals in control and disease state. Thanks for any advice!
Let's suppose your various phenotypic factors are stored in a
library(magrittr) # %>% piping treatments <- data.frame( pheno = rep(c('Ctrl', 'Ctrl', 'Ctrl', 'Disease', 'Disease'), each = 4), cell = rep(paste('A', 1:4, sep = ''), times = 5), pt = rep(paste('ID', 1:5, sep = ''), each = 4) ) treatments %>% head # pheno cell pt # 1 Ctrl A1 ID1 # 2 Ctrl A2 ID1 # ... # 6 Ctrl A2 ID2 .....
I find it easier to separate out the cellular / disease parts from the patient-matching part when building designs for this type of study (I've done them quite a lot for leukaemia datasets with various cell subpopulations over the last 18 months). So we define
# All combns: disease state and cell subpop: # - ie, effect of disease is not assumed to be constant in each different cell type pheno.design <- model.matrix(~ cell * pheno, data = treatments) # No Intercept term in patient.design, # just an indicator column for each person patient.design <- model.matrix(~ -1 + pt, data = treatments) > pheno.design %>% head(3) # (Intercept) cellA2 cellA3 cellA4 phenoDisease cellA2:phenoDisease #1 1 0 0 0 0 0 #2 1 1 0 0 0 0 #3 1 0 1 0 0 0 # cellA3:phenoDisease cellA4:phenoDisease #1 0 0 #2 0 0 #3 0 0 ... > patient.design %>% head # ptID1 ptID2 ptID3 ptID4 ptID5 #1 1 0 0 0 0 #2 1 0 0 0 0 # ... #6 0 1 0 0 0 ...
Now, you can use the following as your design matrix in limma, if you want to then struggle with setting up your contrast matrix:
binary.design <- cbind(pheno.design, patient.design[, c(2,3,5)]) colnames(binary.design) <- c("Intercept", make.names(colnames(binary.design))[-1])
Why will your contrasts matrix be ugly if you use this design? Let's suppose you're interested in comparing disease vs control in cell type A2. Algebraically, the contrast you want is:
(fitted val for cell type A2, Dis) - (fitted val for cell type A2, Ctrl)
The design takes [cell type A1, control status, patient ID1] as baseline (the value that is fitted when only the intercept term is included). If you include the coefficient for 'phenoDisease' as well, you will fit the disease-baseline sample [cell type A1, disease status, patient ID4].
To get the fitted value for [cell type A2, disease status] we compute the mean of the fitted values for the three different control individuals:
[Fitted val: A2, Control, ID1] = Intercept + cellA2; [A2, Ctrl, ID2] = Intercept + cellA2 + ptID2 [A2, Ctrl, ID3] = Intercept + cellA2 + ptID3 => [A2, Ctrl] = Intercept + cellA2 + (ptID2 + ptID3)/3
Similarly, for cell type A2 in the disease samples:
=> [A2, Disease] = Intercept + cellA2 + phenoDisease + cellA2.phenoDisease + ptID5 / 2
So the contrast you want to define is
[Contrast: Dis vs Ctrl in A2] = [A2, Disease] - [A2, Control] = phenoDisease + cellA2.phenoDisease + (ptID5 / 2) - (ptID2 + ptID3) / 3
You don't see these kinds of ugly contrasts for binary model matrices in studies where there's inter-treatment matching (say you had three patients, 4 cell subpops from each and treated the 4 cellpops +/- drug giving 24 samples; the ptID coefficients would cancel out in the definition of your contrast).
You might find it neater to encode the patient matrix as orthogonal differences between the patients in each experimental arm. This removes the ptID terms from the final contrast matrix (they are put inside the model matrix, rather than in the contrast matrix; and is much easier to explain to the bench)
pt.delta.matrix <- patient.design %*% matrix( c(c(1, -1/2, -1/2, 0, 0), # diff ID1 vs mean of ID2/3 c(0, 1, -1, 0, 0), # diff ID2 vs ID3 c(0, 0, 0, 1, -1)), # diff ID4 vs ID5 ncol = 3, dimnames = list(c(), c("pt.delta1.23", "pt.delta2.3", "pt.delta4.5")) ) > pt.delta.matrix # pt.delta1.23 pt.delta2.3 pt.delta4.5 #1 1.0 0 0 #2 1.0 0 0 # ... #6 -0.5 1 0 # ... delta.design <- cbind(pheno.design, pt.delta.matrix) colnames(delta.design) <- c("Intercept", make.names(colnames(delta.design))[-1])
The contrast I tried to explain above then comes out as:
[Contrast: Dis vs Ctrl in A2] = [A2, Disease] - [A2, Control] = phenoDisease + cellA2.phenoDisease
Apologies for the long-winded explanation.