I am using edgeR for the first time and am not confident on how to set up the design matrix for the analyses in an experiment/dataset with >1 variables.
Let's say I have variables a, b, c, and d. There are 2 possibilities for each of the 4 variables (therefore 8 distinct groups total) I would want to investigate the following:
- The effect of each variable independently (e.g. a1 v a2, b1 v b2, this would result in 4 separate comparisons)
- The effect of each of the variables altogether (e.g. comparing across all 8 of the possible groups).
Based on page 14 of the EdgeR guide, it seems like one may use the tilde (~) to specify something that may contribute to a difference in count values but is not the primary variable that you would want to investigate. Other posts, however, suggest that the tilde in R is used to separate the dependent variable from the independent variable.
What is the use of the tilde in the model.matrix function/design in both of the circumstances that I would want above? For further illustration:
- I want to see the effect of and want to 'remove' the potential mediators of b, c, and d. Would I use the following? This is more or less an educated guess.
design <- model.matrix(~ b + ~c + ~d + a)
- I want to see the effects of a, b, c, and d altogether. Would I use the following (again, this is little more than a guess, showing that no tildes on any of the variables indicates they should all be evaluated)
design <- model.matrix(a + b + c + d)