Question: DESeq2 contrasts multiple treatments two genoptypes
gravatar for wd
3 months ago by
wd0 wrote:


I have an experiment with three treatments (E,U and control (C)) and 2 genotypes (1 and 2). I have 3-4 reps for each conditions and each genotype.

I know how to do the comparison (E-C vs U-C) for each genotype separately (with "design = ~ treament" and treatment column containing E, C or U). And then I look at the overlap/no overlap between both DEG lists (DEGs E1vsU1 and DEGs E2vsU2).

However, I think there is a better (statiscal) way would to analyze E vs U changes depending on genotype. In other words, I would like to do the following comparison:

(E1-C1/U1-C1) vs (E2-C2/U2-C2)

which should yield me those genes in the EvsU comparison for genotype 1 that are differentially expressed compared to EvsU comparison for genotype 2.

Thank you in advance.

ADD COMMENTlink modified 3 months ago • written 3 months ago by wd0
gravatar for Kevin Blighe
3 months ago by
Kevin Blighe60k
Kevin Blighe60k wrote:

Is genotype not in your design formula, too, and should that not be included as an additional interaction term with treatment?

I think that Example 2 from ?results should fit your needs in most cases, no?

 ## Example 2: two conditions, two genotypes, with an interaction term

 dds <- makeExampleDESeqDataSet(n=100,m=12)
 dds$genotype <- factor(rep(rep(c("I","II"),each=3),2))

 design(dds) <- ~ genotype + condition + genotype:condition
 dds <- DESeq(dds) 

 # Note: design with interactions terms by default have betaPrior=FALSE

 # the condition effect for genotype I (the main effect)
 results(dds, contrast=c("condition","B","A"))

 # the condition effect for genotype II
 # this is, by definition, the main effect *plus* the interaction term
 # (the extra condition effect in genotype II compared to genotype I).
 results(dds, list( c("condition_B_vs_A","genotypeII.conditionB") ))

 # the interaction term, answering: is the condition effect *different* across genotypes?
 results(dds, name="genotypeII.conditionB")

To get this right, though, you need to have your factor reference levels set correctly.

The other way to get what you want is to create a new variable, called Group, that encodes both the treatment and genotype, and then re-run DESeq2 with that.

These types of questions are probably the most asked here and on Biocondctor forum.


In all honesty, though, I prefer to keep these things as simple as possible, unless you have very large sample n. If 'simple' involves generating multiple results tables and comparing them 'manually', then so be it.


ADD COMMENTlink modified 3 months ago • written 3 months ago by Kevin Blighe60k

Dear Kevin

Thank you for your advice.

I also studied the presented example 2 in ?results, but there are only 2 conditions in that example (while I have three)? So, it is not clear to me, how I should correctly integrate the controls (C) for each genotype (1 or 2).

FYI, my sampleTable looks the following:

  sampleName    treatment   genotype
    X1  E   1
    X2  E   1
    X3  E   1
    X4  E   1
    Y1  U   1
    Y2  U   1
    Y3  U   1
    Y4  U   1
    Z1  C   1
    Z2  C   1
    Z3  C   1
    Z4  C   1
    Q1  E   2
    Q2  E   2
    Q3  E   2
    Q4  E   2
    R1  U   2
    R2  U   2
    R3  U   2
    R4  U   2
    S1  C   2
    S2  C   2
    S3  C   2
    S4  C   2

Also, I combined treatment and genotype into a new variable called group, and then you can indeed easily compare E1vsU1 or E2vsU2, but how are the C samples then taken into account, and how can you statistically compare E1vsU1 with E2vsU2?

My apologies for my ignorance.



ADD REPLYlink written 3 months ago by wd0

I think that you can do it without the grouping variable, and instead via an interaction term. For example:

DataFrame with 8 rows and 5 columns
                 Name    Treat     geno
             <factor> <factor> <factor>
SRR1039508 GSM1275862        a        1
SRR1039509 GSM1275863        a        1
SRR1039512 GSM1275866        a        2
SRR1039513 GSM1275867        b        1
SRR1039516 GSM1275870        b        1
SRR1039517 GSM1275871        b        2 
SRR1039520 GSM1275874  control        1
SRR1039521 GSM1275875  control        2

control is set as the reference level - this is very important.

dds <- DESeqDataSet(airway, design = ~ Treat + geno + Treat:geno)
dds <- DESeq(dds, betaPrior=FALSE)
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

[1] "Intercept"          "Treat_a_vs_control" "Treat_b_vs_control"
[4] "geno_2_vs_1"        "Treata.geno2"       "Treatb.geno2"

Here, Treata.geno2 is essentially the genotypic effect in Treatment a vs control. So, to get the comparison that you want, we compare this to Treatb.geno2:

results(dds, contrast = list("Treatb.geno2", "Treata.geno2"))

Take a look at Example 3 from ?results (at the very end).

ADD REPLYlink modified 3 months ago • written 3 months ago by Kevin Blighe60k

Thank you very much Kevin!

I was able to integrate your solution, and results seem to match with biological expectations.. I do have one trivial question.

Within the same analysis (with all samples loaded into the dds), I was wondering how you should obtain the results for the comparison of geno2 vs geno1, but only for the "control" treatment? I tried

results <- results(dds,contrast=c("geno","2","1")).

But these results are different (twice the number of DEGs) compared to an analysis where I only use the control data (GSM1275875 and GSM1275874 in your example) into the DE dataset.

ADD REPLYlink written 12 weeks ago by wd0

If I am not mistaken, that should be represented by geno_2_vs_1; so, :

results(dds, name = 'geno_2_vs_1').

This is elaborated in the following paragraph of the vignette:

The key point to remember about designs with interaction terms is that, unlike for a design ~genotype + condition, where the condition effect represents the overall effect controlling for differences due to genotype, by adding genotype:condition, the main condition effect only represents the effect of condition for the reference level of genotype (I, or whichever level was defined by the user as the reference level). The interaction terms genotypeII.conditionB and genotypeIII.conditionB give the difference between the condition effect for a given genotype and the condition effect for the reference genotype.

When I am in doubt about these design formulae, I usually spot check some results via box-and-whisker plots just to be sure. Generating quick plots via base boxplot() is easy:

boxplot(gene ~ treatment,
  data = data.frame(treatment = colData(dds)$treatment, t(assay(rld))
ADD REPLYlink written 12 weeks ago by Kevin Blighe60k

Dear Kevin

Thank you for the suggestions

results(dds, name = 'geno_2_vs_1')

did give me the same results as

results <- results(dds,contrast=c("geno","2","1"))

Still twice the number of DEGs compared to the analysis only including control samples, but I assume the extra DEGs are due to the fact that more samples have been included in the analysis (more statiscal power, see e.g. also question here

Thank you for suggesting the vignette, but I'm having a hard time understanding these interaction terms. For example, I also would like to do the following contrasts:

treatment a vs b for genotype 1


treatment a vs b for genotype 2

Again I am able to do this in a separate analysis (only analyzing samples for genotype 1 or only analyzing samples for genotype 2), but not when all samples are analyzed together (samples for genotype 1 and 2 loaded in the DE dataset).

Thank you in advance.

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by wd0

Hey, it is 'better' to normalise the entire dataset together, but I think that it is okay to go back to use a different design formula such that you can conduct all comparisons that you need. By normalising all samples together, irrespective of the design formula, the data will always be normalised in the same way. If you remove samples, however, the size factors will change; thus, normalised values will also differ.

ADD REPLYlink written 12 weeks ago by Kevin Blighe60k

Ok thank you.

If you would know the formule for a vs b for genotype I or for a vs b for genotype II, it would be welcome. This way I can compare how big the difference is when comparing all samples together or a selection of samples only.

⁣Sent from TypeApp ​

ADD REPLYlink written 12 weeks ago by wd0

Well, for the genotype comparison in just controls (or another group), use the 'grouping' variable in your design formula:

~ group


results(dds, contrast = c('group', 'Control_a', 'Control_b'))
ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by Kevin Blighe60k

Thank you, but I was referring to the ~ Treat + geno + Treat:geno design.

How can you compare the two treatments (a vsb) for one genotype using that design.

ADD REPLYlink written 12 weeks ago by wd0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 864 users visited in the last hour