Hi all,

I have some bulk RNA-Seq data obtained from exposing cells from a cell line to 5 different treatments (A, B, C, D and E), along with a medium-only control. This experiment was repeated 5 times (on different passages of the cells, with 1 experiment per week). I am interested in generated lists of DEGs for each treatment vs control (i.e. A vs Ctrl, B vs Ctrl etc...).

I have generated said lists of DEGs using various different experimental designs described in the edgeR Users Guide:

- Grouped - where design = model.matrix(~0+Treatment)
- Paired - where design = model.matrix(~Replicate+Treatment)
- Paired (with blocking) - where design also = model.matrix(~Replicate+Treatment)

Where Treatment is either Ctrl, A, B, C, D, or E and Replicate is a number 1-5.

I find that the paired design is much more stringent than the paired design with blocking (~half the number of DEGs). Could somebody please suggest which is most suitable for my use case?

Thanks in advance!

I don't know what you mean by "blocking". edgeR has no blocking mechanism other than putting the Replicate variable into the design matrix, so what you mean by option 3 is unclear. How does option 3 differ from 2?

Apologies - I followed section 3.4.2 in the edgeR user guide here; this allowed me to select the pairwise comparisons of interest using the coef function.

Option 2 I followed section 3.4.1 of the user guide, only inputting a single pairwise comparison at a time (as in I import data for treatment A and control, for example.

Sections 3.4.1 and 3.4.2 of the User's Guide are identical in your case where your blocking variable as 5 levels. There is no difference between the approach taken in the two sections.

Are you saying that you have actually made 5 different subsets of the data, and in option 2 you are running edgeR on 5 different subsets, each with only one treatment in the dataset? We don't recommend subsetting datasets.

Yes - I made 5 subsets. That would explain the difference in results. Thanks for your help, Gordon!