I want to perform differential analysis while blocking for different effects depending on the comparison/question but I am unsure of how to build the correct design matrix. My metadata contains subjects, 2 timepoints, 2 cell-type, and 2 cell-subtypes. This essentially means there are 4 cell-types in total.
subject timepoint cell.type cell.subtype A pretreatment CD4 memory A pretreatment CD4 naive A pretreatment CD8 memory A pretreatment CD8 naive A posttreatment CD4 memory A posttreatment CD4 naive A posttreatment CD8 memory A posttreatment CD8 naive B pretreatment CD4 memory B pretreatment CD4 naive B pretreatment CD8 memory B pretreatment CD8 naive B posttreatment CD4 memory B posttreatment CD4 naive B posttreatment CD8 memory B posttreatment CD8 naive C pretreatment CD4 memory C pretreatment CD4 naive C pretreatment CD8 memory C pretreatment CD8 naive C posttreatment CD4 memory C posttreatment CD4 naive C posttreatment CD8 memory C posttreatment CD8 naive
I want to perform comparisons across many different groups. I always want to control for subject-level differences. But depending on the comparison, I'll also want to control for cell-types and/or sub-types. However I need to also consider that naive and memory subtypes are different across CD4 and CD8 cells-types, which I think of as interaction effects.
An example of some comparisons I'd like to perform:
1) What global effects does treatment have on these cells? Controlling for differences across subject, cell-type, and cell-subtype, as well as the interactions between cell-type and cell-subtype. In other words, I want to look at global effects of treatment on T-cells, controlling for the different cell types/subtypes and patient-effects.
2) What effects does treatment have on CD4 memory cells? Controlling for subjects.
3) What are the differences between naive and memory cells? Controlling for differences across subjects, and cell-types. But also considering cell-type and sub-type interactions. So I just want the global differences.
My current design to consider these different comparisons looks like this:
model.matrix( ~0 + cell.type*cell.subtype*timepoint + subject, metadata)
And produces this column design:
timepointpretreatment, timepointposttreatment, cell.typeCD8, cell.subtypememory, subjectB, subjectC, timepointposttreatment:cell.typeCD8, timepointposttreatment:cell.subtypememory, cell.typeCD8:cell.subtypememory, timepointposttreatment:cell.typeCD8:cell.subtypememory
I don't know if this design is correct, and it's unclear to me how I can build my contrasts or coefficients since many of the columns are in the intercept. This also makes me think my design is incorrect.
Alternatively, I could create a different design for each question. I like to see the full contrast i'm specifying...
1) model.matrix( ~0 + cell.type*cell.subtype*timepoint + subject, metadata) contrast = timepointposttreatment - timepointpretreatment 2) model.matrix(~0 + cell.subtype:cell.type:timepoint + subject, metadata) contrast = memory:CD4:tposttreatment - memory:CD4:pretreatment 3) model.matrix( ~0 + cell.subtype*cell.type*timepoint + subject , metadata) contrast = cell.subtypememory - cell.subtypenaive
Is there a single design matrix to consider all these comparisons? And if so, what would the contrasts look like? I've read through the edgeR documentation and forums, and understand the examples there but it's unclear how to apply it to this more complex design.