Question

RNAseq edgeR :considering interaction is needed?

1

Entering edit mode

8 weeks ago

tesler ▴ 10

I’m analyzing RNA-seq data but I’m unsure about the best approach, so I’d like to ask for advice. In my study, a drug was applied to human skin, and subjects were divided into an improvement group and a non-improvement group. For each group, RNA-seq data were obtained before and after drug application. Using this data, I want to clarify which genes and pathways change during improvement.

When constructing the design matrix in edgeR, should I use a model that considers the interaction between Time (before vs. after application) and Group (improvement vs. non-improvement), i.e., TimeGroup? In theory, I think TimeGroup would be correct, but I have rarely seen papers that actually consider interactions. If you have experience with this, I’d appreciate your input.

edgeR interaction RNAseq • 6.3k views

ADD COMMENT • link updated 8 weeks ago by SamGG ▴ 150 • written 8 weeks ago by tesler ▴ 10

1

Entering edit mode

but I have rarely seen papers that actually consider interactions.

Well, most papers do not have a detailed methods section on the actual under-the-hood way of how they make their design matrix, neither do most papers show code. I think a normal ~time+group+time:group setup, or alternatively a full factorial ~0 + time_group (which is equivalent but more flexible in making contrasts) is the most standard thing to do in most analysis setups.

If you omit the interaction then you assume that the time effect is stable across groups. I almost always go with the full factorial model in such a case.

ADD REPLY • link 8 weeks ago by ATpoint 89k

0

Entering edit mode

As subjects seem to be measured before and after, a subject effect should be considered.

I suggest you to read "3.5 Comparisons both between and within subjects" in the edgeR documentation.

For interaction only, you can refer to https://f1000research.com/articles/5-1438

ADD REPLY • link 8 weeks ago by SamGG ▴ 150

score 0 · Answer 1 · 2025-09-08

a drug was applied to human skin, and subjects were divided into an improvement group and a non-improvement group

In this sort of design one of the questions of interest is usually to assess, for each gene, if the drug affects expression in the same way in the "improved" and in the "non-improved" group (or better, rather than "if" I would say "how much", since almost for sure there is a difference, no matter how small and indistinguishable from zero). This seems to be your case and fitting the interaction term is arguably the most appropriate approach.

However, my impression, and to some extent experience, is that people fit a model only to the "improved" group and estimate "ctrl vs drug", then do the same for the non-improved group. Finally take the difference between the two differences to estimate the interaction. Alternatively, intersect the sets of DE genes from the two models to find genes that respond to drug in one group but not the other. This strategy seems easier, but it makes sub-optimal use of the data and it may be more complicated to handle (e.g. what cutoffs do you choose and how do you assess the uncertainty of the differences or intersections?)