Question

Selecting the correct experimental design for edgeR differential expression analysis

0

Entering edit mode

2.3 years ago

PeakGosling • 0

Hi all,

I have some bulk RNA-Seq data obtained from exposing cells from a cell line to 5 different treatments (A, B, C, D and E), along with a medium-only control. This experiment was repeated 5 times (on different passages of the cells, with 1 experiment per week). I am interested in generated lists of DEGs for each treatment vs control (i.e. A vs Ctrl, B vs Ctrl etc...).

I have generated said lists of DEGs using various different experimental designs described in the edgeR Users Guide:

Grouped - where design = model.matrix(~0+Treatment)
Paired - where design = model.matrix(~Replicate+Treatment)
Paired (with blocking) - where design also = model.matrix(~Replicate+Treatment)

Where Treatment is either Ctrl, A, B, C, D, or E and Replicate is a number 1-5.

I find that the paired design is much more stringent than the paired design with blocking (~half the number of DEGs). Could somebody please suggest which is most suitable for my use case?

Thanks in advance!

Blocking RNA-Seq Paired edgeR Design Experimental • 1.8k views

ADD COMMENT • link 2.3 years ago by PeakGosling • 0

1

Entering edit mode

I don't know what you mean by "blocking". edgeR has no blocking mechanism other than putting the Replicate variable into the design matrix, so what you mean by option 3 is unclear. How does option 3 differ from 2?

ADD REPLY • link 2.3 years ago by Gordon Smyth ★ 7.1k

0

Entering edit mode

Apologies - I followed section 3.4.2 in the edgeR user guide here; this allowed me to select the pairwise comparisons of interest using the coef function.

Option 2 I followed section 3.4.1 of the user guide, only inputting a single pairwise comparison at a time (as in I import data for treatment A and control, for example.

ADD REPLY • link 2.3 years ago by PeakGosling • 0

1

Entering edit mode

Sections 3.4.1 and 3.4.2 of the User's Guide are identical in your case where your blocking variable as 5 levels. There is no difference between the approach taken in the two sections.

Are you saying that you have actually made 5 different subsets of the data, and in option 2 you are running edgeR on 5 different subsets, each with only one treatment in the dataset? We don't recommend subsetting datasets.

ADD REPLY • link 2.3 years ago by Gordon Smyth ★ 7.1k

0

Entering edit mode

Yes - I made 5 subsets. That would explain the difference in results. Thanks for your help, Gordon!

ADD REPLY • link 2.3 years ago by PeakGosling • 0

score 1 · Answer 1 · 2022-01-12

edgeR is designed to analyse complete datasets as a whole, using all the information available. For your experimental design, the recommended approach is model.matrix(~Replicate+Treatment) as in Section 3.4.2 of the edgeR User's Guide (your option 3). You could also use model.matrix(~0+Treatment+Replicate), which is statistically the same but allows you to form contrasts between the treatments same as in your option 1.

Option 1 will generally give less power than option 3 because it fails to adjust for baseline differences between the replicate experiments.

If I am understanding correctly, your option 2 is making subsets of the data with only Ctrl and one treatment in each subset. Option 2 will have less power than option 3 in most cases because it has fewer samples available from which to estimate the variability of the data. We don't recommending subsetting data sets. Analysing all the data together gives a more powerful and less fragmented analysis. It also gives more analysis possibilities, such as the possibility of comparing the treatments to each other (e.g., A to B).