How to interpret different results in deseq2 due to different design
Entering edit mode
4.2 years ago
tianshenbio ▴ 180

I have a dataset associated with two factors(Stage and Form). Stage has 4 conditions and Form has two conditions:

          Stage  Form
DS1_Wr60  Wr60   DS
DS2_Wr60  Wr60   DS
DS3_Wr60  Wr60   DS
DS4_Wr60  Wr60   DS
WS1_Wr60  Wr60   WS
WS2_Wr60  Wr60   WS
WS3_Wr60  Wr60   WS
WS4_Wr60  Wr60   WS
DS1_PP50  PP50   DS
DS2_PP50  PP50   DS
DS3_PP50  PP50   DS
DS4_PP50  PP50   DS
WS1_PP50  PP50   WS
WS2_PP50  PP50   WS
WS3_PP50  PP50   WS
WS4_PP50  PP50   WS
DS1_P15    P15   DS
DS2_P15    P15   DS
DS3_P15    P15   DS
DS4_P15    P15   DS
WS1_P15    P15   WS
WS2_P15    P15   WS
WS3_P15    P15   WS
WS4_P15    P15   WS
DS1_P50    P50   DS
DS2_P50    P50   DS
DS3_P50    P50   DS
DS4_P50    P50   DS
WS1_P50    P50   WS
WS2_P50    P50   WS
WS3_P50    P50   WS
WS4_P50    P50   WS

I tried to get DE genes between different stages using two different design:

1. design = ~ Stage
2. design = ~ Stage+Form


1. > resultsNames(dds_out)
[1] "Intercept"         "Stage_P50_vs_P15"  "Stage_PP50_vs_P15" "Stage_Wr60_vs_P15"
2. > resultsNames(dds_out)
[1] "Intercept"         "Stage_P50_vs_P15"  "Stage_PP50_vs_P15" "Stage_Wr60_vs_P15"
[5] "Form_WS_vs_DS"

I noticed that the results for the same comparison, for example 'Stage_P50_vs_P15", are different. I wonder how design 1 and 2 work? How should I design if I hope to get DE genes between Stages (consider the effect of "Stage" only)?

RNA-Seq R sequencing deseq2 • 1.2k views
Entering edit mode

You posted about this before. In order to keep it now focused in this thread: What is the question you want to answer? Both designes are valid but it depends on the question. If you are interested in only Stage then use design 1. Please exactly describe what this experiment is and what you want to answer.

Entering edit mode

Hi, thank you for your reply. "Form" indicates that the organism is reared under two temperatures (WS and DS), and "Stage" indicates four developmental stages of the organism (Wr60, PP50, P15, and P50). There are four biological replications for each combination of the factors. Now I hope to find how genes are differentially expressed between two stages. Since both designs make comparison between stages, I wonder how design1 is different from design2.

Entering edit mode

Your second model assesses the effect of stage in a manner that is 'controlled' for the different baseline expression levels you'd get due to the different forms.

Entering edit mode

As per russhh, for your formula ~ Stage + Form, what is happening is that DESeq2 is 'adjusting' the statistical inferences for your Stage variable based on Form. That is, in this model, Form is treated as a covariate.

This is how we 'adjust' for variables in regression modeling. Say I wanted to adjust for smoking status and menopausal status while testing Arthritis against my gene's expression, my model would be:

 ~ Arthritis + Smoking + Menopause

The p-valus for Arthritis will be adjusted for the estimated effects of Smoking and Menopause.

Entering edit mode

Thank you for your reply Kevin Blighe In your example you mentioned that the p-values for Arthritis is 'adjusted' for the effects of smoking and menopause, do you mean that the effects of smoking and menopause are 'eliminated/reduced' so that the DE result would reveal the effects of arthritis only? Is it the same as eliminating batch effect? In my case, I hope to examine the effect of stage only, but definitely Form also affects gene expression so it should be considered covariates, thus design 2 (~Stage+Form) would be more appropriate for my purpose since it eliminates the effects of Form, am I correct?

Entering edit mode

thus design 2 (~Stage+Form) would be more appropriate for my purpose since it eliminates the effects of Form, am I correct?

Yes, that is correct.

When we adjust for batch by including batch in a design formula, it is indeed the exact same as, for example, including BMI or smoking status in the design formula. However, this does not adjust the actual expression data for these covariates - it just 'adjusts' the statistical inferences that we are making of the expression data in the context of the design formula (ultimately, it is p-values that are modified). If we want to actually modify the expression data and eliminate the effects of batch or anything else, then we need to apply other methods.

Entering edit mode

This is clear, thank you so much!


Login before adding your answer.

Traffic: 3331 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6