I’m analyzing RNA-seq data but I’m unsure about the best approach, so I’d like to ask for advice. In my study, a drug was applied to human skin, and subjects were divided into an improvement group and a non-improvement group. For each group, RNA-seq data were obtained before and after drug application. Using this data, I want to clarify which genes and pathways change during improvement.
When constructing the design matrix in edgeR, should I use a model that considers the interaction between Time (before vs. after application) and Group (improvement vs. non-improvement), i.e., TimeGroup? In theory, I think TimeGroup would be correct, but I have rarely seen papers that actually consider interactions. If you have experience with this, I’d appreciate your input.
Well, most papers do not have a detailed methods section on the actual under-the-hood way of how they make their design matrix, neither do most papers show code. I think a normal
~time+group+time:group
setup, or alternatively a full factorial~0 + time_group
(which is equivalent but more flexible in making contrasts) is the most standard thing to do in most analysis setups.If you omit the interaction then you assume that the time effect is stable across groups. I almost always go with the full factorial model in such a case.
As subjects seem to be measured before and after, a subject effect should be considered.
I suggest you to read "3.5 Comparisons both between and within subjects" in the edgeR documentation.
For interaction only, you can refer to https://f1000research.com/articles/5-1438