Question: Help with multi-factor Deseq analysis
0
6 months ago by
MAPK1.4k
United States
MAPK1.4k wrote:

Hi All, I need help to resolve this problem with deseq. I have two cultivars (`R` and `S`), two time points (`5d` and `30d`) and four treatments (`Scn`, `Aph`, `ScnAph`, `Ctr`). There are two sets of controls (`Ctr`) for each cultivar and I have a total of 48 samples. I need to compare:

1. Scn Vs Ctr within resistance within 5 day
2. Scn Vs Aph within susceptible within 5 day
3. Scn Vs Ctr within resistance within 30 day
4. Scn Vs Aph within susceptible within 30 day
5. R Vs S in whole dataset

Can someone please guide me through this. I have tried this model which looks like this, but I am not sure if this is the correct way of doing it. Thanks for your help in advance.

``````dds <- DESeqDataSetFromMatrix(countData=count.mat,colData=cond,design= ~cultivar+time+cultivar:time)
##you  relevel 5d so you'll do "cultivarR.time30d"
dds\$time <- relevel(dds\$time, ref = "5d")
##you relevel S so you'll do "cultivarR.time5d"
dds\$cultivar <- relevel(dds\$cultivar, ref = "S")
dds = DESeq(dds, test="LRT", reduced= ~cultivar + time)
#5d vs 30d within resistance
##in case where dds\$time <- relevel(dds\$time, ref = "S")
resultsNames(dds)
# > resultsNames(dds)
# [1] "Intercept"        "cultivar_R_vs_S"  "time_5d_vs_30d"   "cultivarR.time5d"
#5d vs 30d within resistance
tt<- results(dds, contrast=list(c("time_5d_vs_30d", "cultivarR.time5d")), test="Wald")
``````

My data:

my `count.mat` looks like this:

``````count.mat <- structure(list(sam5_4a_counts = c(26, 28, 1, 137, 3, 0), sam5_4b_counts = c(13,
6, 0, 95, 3, 0), sam5_4c_counts = c(1, 7, 0, 41, 2, 0), sam5_5a_counts = c(28,
15, 0, 84, 0, 0), sam5_5b_counts = c(12, 29, 1, 97, 2, 1), sam5_5c_counts = c(10,
11, 0, 77, 2, 0), sam5_6a_counts = c(42, 24, 0, 139, 4, 0), sam5_6b_counts = c(29,
28, 1, 166, 1, 1), sam5_6c_counts = c(29, 46, 0, 112, 5, 3),
sam5_7a_counts = c(7, 7, 1, 65, 0, 0), sam5_7b_counts = c(37,
10, 0, 108, 4, 0), sam5_7c_counts = c(7, 4, 0, 47, 0, 0),
sam5_1a_counts = c(44, 56, 4, 107, 2, 0), sam5_1b_counts = c(13,
11, 3, 44, 1, 0), sam5_1c_counts = c(39, 55, 1, 166, 1, 0
), sam5_2a_counts = c(4, 8, 1, 75, 1, 0), sam5_2b_counts = c(126,
160, 10, 414, 5, 1), sam5_2c_counts = c(28, 37, 1, 209, 3,
1), sam5_3a_counts = c(38, 70, 5, 138, 3, 1), sam5_3b_counts = c(132,
218, 6, 390, 14, 5), sam5_3c_counts = c(10, 2, 0, 39, 0,
1), sam5_8a_counts = c(19, 37, 1, 140, 1, 1), sam5_8b_counts = c(5,
12, 0, 63, 1, 0), sam5_8c_counts = c(27, 14, 1, 90, 0, 0),
sam30_4a_counts = c(29, 31, 0, 68, 2, 0), sam30_4b_counts = c(32,
24, 0, 70, 1, 0), sam30_4c_counts = c(13, 8, 0, 38, 2, 1),
sam30_5a_counts = c(22, 14, 0, 104, 2, 0), sam30_5b_counts = c(37,
49, 2, 88, 1, 0), sam30_5c_counts = c(37, 84, 1, 106, 0,
1), sam30_6a_counts = c(74, 58, 3, 110, 2, 1), sam30_6b_counts = c(68,
183, 3, 150, 2, 1), sam30_6c_counts = c(38, 86, 1, 161, 1,
0), sam30_7a_counts = c(21, 27, 0, 93, 2, 0), sam30_7b_counts = c(27,
20, 0, 89, 0, 1), sam30_7c_counts = c(24L, 23L, 0L, 91L,
1L, 0L), sam30_1a_counts = c(5, 7, 0, 16, 0, 0), sam30_1b_counts = c(38,
35, 3, 102, 0, 0), sam30_1c_counts = c(55, 26, 0, 136, 2,
0), sam30_2a_counts = c(20, 6, 0, 65, 2, 0), sam30_2b_counts = c(10,
5, 0, 43, 0, 0), sam30_2c_counts = c(86, 88, 7, 167, 6, 0
), sam30_3a_counts = c(39, 19, 1, 132, 3, 0), sam30_3b_counts = c(30,
31, 0, 113, 2, 0), sam30_3c_counts = c(59, 35, 0, 104, 3,
0), sam30_8a_counts = c(37, 22, 0, 133, 1, 0), sam30_8b_counts = c(28,
20, 0, 150, 2, 0), sam30_8c_counts = c(13, 20, 1, 104, 0,
0)), row.names = c("Glyma.01G000100", "Glyma.01G000200",
"Glyma.01G000300", "Glyma.01G000400", "Glyma.01G000500", "Glyma.01G000600"
), class = "data.frame")
``````

my condition dataframe `cond`.

``````cond <- structure(list(cultivar = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("R", "S"), class = "factor"),
time = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("30d", "5d"), class = "factor"),
treatment = structure(c(3L, 3L, 3L, 1L, 1L, 1L, 4L, 4L, 4L,
2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L,
3L, 3L, 3L, 1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L, 3L, 3L, 3L,
1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L), .Label = c("Aph", "Ctr",
"Scn", "ScnAph"), class = "factor")), row.names = c(NA, -48L
), class = "data.frame")
``````
rna-seq deseq2 • 462 views
modified 6 months ago • written 6 months ago by MAPK1.4k
1

I think that you should consider merging your `cultivar` and `treatment` variables:

``````cond\$CultivarTreatment <- paste(cond\$cultivar, cond\$treatment, sep="_")

cultivar time treatment CultivarTreatment
1        R   5d       Scn             R_Scn
2        R   5d       Scn             R_Scn
3        R   5d       Scn             R_Scn
4        R   5d       Aph             R_Aph
5        R   5d       Aph             R_Aph
6        R   5d       Aph             R_Aph
``````

....and then using:

``````dds <- DESeqDataSetFromMatrix(countData=count.mat, colData=cond, design= ~ CultivarTreatment + time + CultivarTreatment:time)

dds <- DESeq(dds, test="LRT", reduced= ~ CultivarTreatment + time)
``````

Best person to ask is, of course, Michael Love, who is more active on the Bioconductor forums. He'll respond pretty quickly there.

Thanks, Kevin. I will post this in bioconductor as well.

I think that the way I've stated will allow you to achieve numbers 1-4 in your list, but possibly not #5. Interested to see what Michael Love suggests.

If anyone else here has suggestions, please chip in.

@Kevin Blighe, So I got the following results after running your code. Now how do I get #1 to #4? Thanks again for your help.

``````> resultsNames(dds)
[1] "Intercept"                           "CultivarTreatment_R_Ctr_vs_R_Aph"    "CultivarTreatment_R_Scn_vs_R_Aph"    "CultivarTreatment_R_ScnAph_vs_R_Aph"
[5] "CultivarTreatment_S_Aph_vs_R_Aph"    "CultivarTreatment_S_Ctr_vs_R_Aph"    "CultivarTreatment_S_Scn_vs_R_Aph"    "CultivarTreatment_S_ScnAph_vs_R_Aph"
[9] "time_5d_vs_30d"                      "CultivarTreatmentR_Ctr.time5d"       "CultivarTreatmentR_Scn.time5d"       "CultivarTreatmentR_ScnAph.time5d"
[13] "CultivarTreatmentS_Aph.time5d"       "CultivarTreatmentS_Ctr.time5d"       "CultivarTreatmentS_Scn.time5d"       "CultivarTreatmentS_ScnAph.time5d"
``````

I see that you got similar advice on Bioconductor. Would it not make more sense to set `5d` as the reference level for `time`? It looks like you have `30d` as the reference level. If you switch that around, then it should be (I think):

``````# Scn Vs Ctr within resistance within 5 day
results(dds, name="CultivarTreatment_R_Scn_vs_R_Ctr", test="Wald") # due to the fact that 5d is reference level

# Scn Vs Aph within susceptible within 5 day
results(dds, name="CultivarTreatment_S_Scn_vs_S_Aph", test="Wald")

# Scn Vs Ctr within resistance within 30 day
results(dds, contrast=list(c("CultivarTreatmentR_Scn.time30d", "CultivarTreatmentR_Ctr.time30d")), test="Wald")

Scn Vs Aph within susceptible within 30 day
results(dds, contrast=list(c("CultivarTreatmentS_Scn.time30d", "CultivarTreatmentS_Aph.time30d")), test="Wald")
``````

I would just confirm this with Michael Love

1

Thank you so much, Kevin. That's right, releveling it to ref=5 day makes more sense.

1

I see that it was recommended to combine all 3 together with paste. I had considered that but was not entirely sure of the validity in the context of a DESeq2 design formula.

Yes, that's right. However, after merging all three columns (as `TimeTreatmentCultivar`), I tried and got the following error below. How do I reduce it properly? Thank you for your help. my code:

`````` cond\$TimeTreatmentCultivar<- as.factor(paste(cond\$cultivar,cond\$time,cond\$treatment, sep = "_"))
dds <- DESeqDataSetFromMatrix(countData=count.mat,colData=cond,design= ~TimeTreatmentCultivar)
dds = DESeq(dds, test="LRT", reduced= ~ TimeTreatmentCultivar)
``````

Error:

``````estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Error in nbinomLRT(object, full = full, reduced = reduced, quiet = quiet,  :
less than one degree of freedom, perhaps full and reduced models are not in the correct order
``````

I thought that, in this situation, you could avoid the `reduced` step.

Also, given the complex experimental set-up that you have, I also thought that, perhaps, you could perform everything separately for R and S, although that introduces some extra bias into the results.

So should I just do `dds = DESeq(dds)` instead of `dds = DESeq(dds, test="LRT", reduced= ~ TimeTreatmentCultivar)`? I also tried`dds = DESeq(dds, test="LRT")`, but got this error below. Or should I just do `reduce = ~1`?

``````Error in DESeq(dds, test = "LRT") :
likelihood ratio test requires a 'reduced' design, see ?DESeq
``````
1

I think that you should just do `DESeq(dds)`. When you generate your results, you don't appear to use the LRT anywhere; instead, Wald is specified. As your aim is to just compare pairs of levels, you don't require the LRT functionality.