Question: DEseq2 design matrix with 3 factors
gravatar for kand3e
7 months ago by
kand3e30 wrote:

Hi Everyone,

I'm a complete novice to DEG analysis and linear models and I have some questions regarding the setup of the design matrix. I have read up some posts in this forum with similar experimental design, but they don't really have the answers I'm looking for. My experiment was designed as follows:

1) Two genotype groups (Genotype: WT vs. KO)

2) Two treatments conditions for each genotype group (Condition: Ctrl vs. Trt)

3) Equal number of both sex in each genotype group under each treatment condition (Sex: F vs. M)

When we first designed this experiment, sex was not a factor we considered and the main purpose was just to see whether the expression profiles of the two genotypes differ at steady-state (ctrl) and after stimulation (trt). We included equal number of both sex in each group just in case of sex bias. However, when we did PCA analysis, we actually saw some differences between sex in each genotype, and this difference is further increased after treatment.

Now, the questions we would like to answer are:

1) If we just want to see how genotype and treatment interact (E.g.: Ctrl WT vs Trt WT || Ctrl KO vs Trt KO || Ctrl WT vs Ctrl KO || Trt WT vs Trt KO), should I use a design=~Genotype+Condition+Genotype:Condition and follow the comparison setups here

or now knowing there are variations in sex, use a design=~Sex+Genotype+Condition+Genotype:Condition (to take care of differences in sex) and still follow the same comparison setups as indicated in the link above?

2) If we also want to see how gene expression differs between sex within a genotype group and between two genotype groups under each treatment condition (e.g. F vs M in Ctrl WT || F vs M in Ctrl KO || F vs M in Trt WT || F vs M in Trt KO || F Ctrl WT vs F Ctrl KO || M Ctrl WT vs M Ctrl KO || F Trt WT vs F Trt KO || M Trt WT vs M Trt KO), how should I set up the design matrix? I have very limited knowledge on how interaction terms work and I'm not sure what I should do in order to get all those comparisons. I would really appreciate it if someone can provide some advise.

Also, I've read in some other posts that for complex design such as this, maybe it's better to name each sample using all three factors (e.g. F_Ctrl_WT, F_Trt_WT etc.) and just use the "contrast" command to call out the groups I'm interested in comparing. Will this work? How is this different than using the "~A+B+C+A:C+B:C" type of setup?

Thanks so much for your help!

rna-seq deseq2 forum • 530 views
ADD COMMENTlink modified 7 months ago by ATpoint44k • written 7 months ago by kand3e30
gravatar for ATpoint
7 months ago by
ATpoint44k wrote:

For 1) I would indeed use a factorial model as this makes it as easy as ~0+factor followed by making all contrasts you want. In order to keep things simple, wouldn't it be desirable to make a new model for 2), again full factorial, so e.g. F_Ctrl_WT vs M_Ctrl_WT? Unless there is a reason to use complex interaction models I personally try to avoid them at all cost simply because I am not proficient enough in statistics to properly set them up. See the DESeq2 manual, it talks about interactions and identical factorial designs.

ADD COMMENTlink modified 7 months ago • written 7 months ago by ATpoint44k

Thanks ATpoint! So for question 1), should I add the "~Sex" to take care of variation in gene expression between sex or just leave it as ~Genotype+Condition+Genotype:Condition? I actually tried running both matrix as a test, and I do identify a bit more DEGs with the '~Sex' included than not having it. Do you know why that might be?

ADD REPLYlink written 7 months ago by kand3e30

Try to determine, first, if Sex is a confounding factor. Leave it out of the formula and then generate a PCA bi-plot. If you notice any stratification based on Sex, then maybe include it in the design formula. These types of things are 'executive' decisions that you as an analyst will have to make repeatedly in your career.

By leaving it in your formula, you are essentially then 'controlling for' the effect of Sex when deriving test statistics for Condition / Genotype. However, you would not want to control for something if it's not necessary.

ADD REPLYlink written 7 months ago by Kevin Blighe69k

Hi Kevin, Thanks so much for your suggestion.

We did the PCA analysis, and the samples are separated by both genotype (smaller separation) and treatment (larger separation) on PC1, and on PC2 we do see very distinct separation of male and female. Therefore, I guess I should control for the effect of Sex in the formula. But what I couldn't really understand is why after I add in the 'sex' effect in the design matrix, the list of DEGs I get is actually even longer than if I leave it out, I thought it would be the other way around :S

Anyway, would you have any recommendations on how I should go about targeting question 2? Those are the specific pair-wise comparisons we are interested in, so it will be great if you can provide me some insights on how to setup the matrix design to do those comparisons. Thanks a lot for your help!

ADD REPLYlink written 7 months ago by kand3e30

I am not the best to answer in detail on why you would find more statistically significantly differentially expressed genes after including Sex in the design formula; however, try to think of it this way: by not including it and not controlling for the effect of sex, the true condition and genotypic effect of some genes will actually be masked by differences relating to sex, differences which only become apparent after you control for it [sex]. I am trying to think of an example but somewhat struggling... Best thing would be to actually produce box-plots of these 'new' genes and see how their profiles different according to all parameters in the design formula.

I think that, for the second part, you may need an interaction term. However, you already have an interaction term, from what I understand. In that case, you may have to create a new 'merged' parameter, like, GenotypeSex, and use that in an interaction with Condition.

There are very good examples listed at the end of the manual entry page for DESeq2::results. Have you looked there? - just type ?DESeq2::results in the terminal and scroll down to the end of the entry page

ADD REPLYlink modified 7 months ago • written 7 months ago by Kevin Blighe69k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1378 users visited in the last hour