Selecting the correct experimental design for edgeR differential expression analysis
1
0
Entering edit mode
2.3 years ago

Hi all,

I have some bulk RNA-Seq data obtained from exposing cells from a cell line to 5 different treatments (A, B, C, D and E), along with a medium-only control. This experiment was repeated 5 times (on different passages of the cells, with 1 experiment per week). I am interested in generated lists of DEGs for each treatment vs control (i.e. A vs Ctrl, B vs Ctrl etc...).

I have generated said lists of DEGs using various different experimental designs described in the edgeR Users Guide:

  1. Grouped - where design = model.matrix(~0+Treatment)
  2. Paired - where design = model.matrix(~Replicate+Treatment)
  3. Paired (with blocking) - where design also = model.matrix(~Replicate+Treatment)

Where Treatment is either Ctrl, A, B, C, D, or E and Replicate is a number 1-5.

I find that the paired design is much more stringent than the paired design with blocking (~half the number of DEGs). Could somebody please suggest which is most suitable for my use case?

Thanks in advance!

Blocking RNA-Seq Paired edgeR Design Experimental • 1.8k views
ADD COMMENT
1
Entering edit mode

I don't know what you mean by "blocking". edgeR has no blocking mechanism other than putting the Replicate variable into the design matrix, so what you mean by option 3 is unclear. How does option 3 differ from 2?

ADD REPLY
0
Entering edit mode

Apologies - I followed section 3.4.2 in the edgeR user guide here; this allowed me to select the pairwise comparisons of interest using the coef function.

Option 2 I followed section 3.4.1 of the user guide, only inputting a single pairwise comparison at a time (as in I import data for treatment A and control, for example.

ADD REPLY
1
Entering edit mode

Sections 3.4.1 and 3.4.2 of the User's Guide are identical in your case where your blocking variable as 5 levels. There is no difference between the approach taken in the two sections.

Are you saying that you have actually made 5 different subsets of the data, and in option 2 you are running edgeR on 5 different subsets, each with only one treatment in the dataset? We don't recommend subsetting datasets.

ADD REPLY
0
Entering edit mode

Yes - I made 5 subsets. That would explain the difference in results. Thanks for your help, Gordon!

ADD REPLY
1
Entering edit mode
2.3 years ago
Gordon Smyth ★ 7.1k

edgeR is designed to analyse complete datasets as a whole, using all the information available. For your experimental design, the recommended approach is model.matrix(~Replicate+Treatment) as in Section 3.4.2 of the edgeR User's Guide (your option 3). You could also use model.matrix(~0+Treatment+Replicate), which is statistically the same but allows you to form contrasts between the treatments same as in your option 1.

Option 1 will generally give less power than option 3 because it fails to adjust for baseline differences between the replicate experiments.

If I am understanding correctly, your option 2 is making subsets of the data with only Ctrl and one treatment in each subset. Option 2 will have less power than option 3 in most cases because it has fewer samples available from which to estimate the variability of the data. We don't recommending subsetting data sets. Analysing all the data together gives a more powerful and less fragmented analysis. It also gives more analysis possibilities, such as the possibility of comparing the treatments to each other (e.g., A to B).

ADD COMMENT
0
Entering edit mode

Thanks a lot! I will go ahead with option 3, and avoid subsetting data for DEG analysis in the future.

ADD REPLY

Login before adding your answer.

Traffic: 1318 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6