Selecting the correct experimental design for edgeR differential expression analysis
1
0
Entering edit mode
4 months ago

Hi all,

I have some bulk RNA-Seq data obtained from exposing cells from a cell line to 5 different treatments (A, B, C, D and E), along with a medium-only control. This experiment was repeated 5 times (on different passages of the cells, with 1 experiment per week). I am interested in generated lists of DEGs for each treatment vs control (i.e. A vs Ctrl, B vs Ctrl etc...).

I have generated said lists of DEGs using various different experimental designs described in the edgeR Users Guide:

1. Grouped - where design = model.matrix(~0+Treatment)
2. Paired - where design = model.matrix(~Replicate+Treatment)
3. Paired (with blocking) - where design also = model.matrix(~Replicate+Treatment)

Where Treatment is either Ctrl, A, B, C, D, or E and Replicate is a number 1-5.

I find that the paired design is much more stringent than the paired design with blocking (~half the number of DEGs). Could somebody please suggest which is most suitable for my use case?

Blocking RNA-Seq Paired edgeR Design Experimental • 567 views
1
Entering edit mode

I don't know what you mean by "blocking". edgeR has no blocking mechanism other than putting the Replicate variable into the design matrix, so what you mean by option 3 is unclear. How does option 3 differ from 2?

0
Entering edit mode

Apologies - I followed section 3.4.2 in the edgeR user guide here; this allowed me to select the pairwise comparisons of interest using the coef function.

Option 2 I followed section 3.4.1 of the user guide, only inputting a single pairwise comparison at a time (as in I import data for treatment A and control, for example.

1
Entering edit mode

Sections 3.4.1 and 3.4.2 of the User's Guide are identical in your case where your blocking variable as 5 levels. There is no difference between the approach taken in the two sections.

Are you saying that you have actually made 5 different subsets of the data, and in option 2 you are running edgeR on 5 different subsets, each with only one treatment in the dataset? We don't recommend subsetting datasets.

0
Entering edit mode

Yes - I made 5 subsets. That would explain the difference in results. Thanks for your help, Gordon!

1
Entering edit mode
4 months ago
Gordon Smyth ★ 4.4k

edgeR is designed to analyse complete datasets as a whole, using all the information available. For your experimental design, the recommended approach is model.matrix(~Replicate+Treatment) as in Section 3.4.2 of the edgeR User's Guide (your option 3). You could also use model.matrix(~0+Treatment+Replicate), which is statistically the same but allows you to form contrasts between the treatments same as in your option 1.

Option 1 will generally give less power than option 3 because it fails to adjust for baseline differences between the replicate experiments.

If I am understanding correctly, your option 2 is making subsets of the data with only Ctrl and one treatment in each subset. Option 2 will have less power than option 3 in most cases because it has fewer samples available from which to estimate the variability of the data. We don't recommending subsetting data sets. Analysing all the data together gives a more powerful and less fragmented analysis. It also gives more analysis possibilities, such as the possibility of comparing the treatments to each other (e.g., A to B).

0
Entering edit mode

Thanks a lot! I will go ahead with option 3, and avoid subsetting data for DEG analysis in the future.