Entering edit mode
6.1 years ago
MAPK
★
2.1k
Hi All,
I need help to resolve this problem with deseq.
I have two cultivars (R
and S
), two time points (5d
and 30d
) and four treatments (Scn
, Aph
, ScnAph
, Ctr
). There are two sets of controls (Ctr
) for each cultivar and I have a total of 48 samples.
I need to compare:
- Scn Vs Ctr within resistance within 5 day
- Scn Vs Aph within susceptible within 5 day
- Scn Vs Ctr within resistance within 30 day
- Scn Vs Aph within susceptible within 30 day
- R Vs S in whole dataset
Can someone please guide me through this. I have tried this model which looks like this, but I am not sure if this is the correct way of doing it. Thanks for your help in advance.
dds <- DESeqDataSetFromMatrix(countData=count.mat,colData=cond,design= ~cultivar+time+cultivar:time)
##you relevel 5d so you'll do "cultivarR.time30d"
dds$time <- relevel(dds$time, ref = "5d")
##you relevel S so you'll do "cultivarR.time5d"
dds$cultivar <- relevel(dds$cultivar, ref = "S")
dds = DESeq(dds, test="LRT", reduced= ~cultivar + time)
#5d vs 30d within resistance
##in case where dds$time <- relevel(dds$time, ref = "S")
resultsNames(dds)
# > resultsNames(dds)
# [1] "Intercept" "cultivar_R_vs_S" "time_5d_vs_30d" "cultivarR.time5d"
#5d vs 30d within resistance
tt<- results(dds, contrast=list(c("time_5d_vs_30d", "cultivarR.time5d")), test="Wald")
My data:
my count.mat
looks like this:
count.mat <- structure(list(sam5_4a_counts = c(26, 28, 1, 137, 3, 0), sam5_4b_counts = c(13,
6, 0, 95, 3, 0), sam5_4c_counts = c(1, 7, 0, 41, 2, 0), sam5_5a_counts = c(28,
15, 0, 84, 0, 0), sam5_5b_counts = c(12, 29, 1, 97, 2, 1), sam5_5c_counts = c(10,
11, 0, 77, 2, 0), sam5_6a_counts = c(42, 24, 0, 139, 4, 0), sam5_6b_counts = c(29,
28, 1, 166, 1, 1), sam5_6c_counts = c(29, 46, 0, 112, 5, 3),
sam5_7a_counts = c(7, 7, 1, 65, 0, 0), sam5_7b_counts = c(37,
10, 0, 108, 4, 0), sam5_7c_counts = c(7, 4, 0, 47, 0, 0),
sam5_1a_counts = c(44, 56, 4, 107, 2, 0), sam5_1b_counts = c(13,
11, 3, 44, 1, 0), sam5_1c_counts = c(39, 55, 1, 166, 1, 0
), sam5_2a_counts = c(4, 8, 1, 75, 1, 0), sam5_2b_counts = c(126,
160, 10, 414, 5, 1), sam5_2c_counts = c(28, 37, 1, 209, 3,
1), sam5_3a_counts = c(38, 70, 5, 138, 3, 1), sam5_3b_counts = c(132,
218, 6, 390, 14, 5), sam5_3c_counts = c(10, 2, 0, 39, 0,
1), sam5_8a_counts = c(19, 37, 1, 140, 1, 1), sam5_8b_counts = c(5,
12, 0, 63, 1, 0), sam5_8c_counts = c(27, 14, 1, 90, 0, 0),
sam30_4a_counts = c(29, 31, 0, 68, 2, 0), sam30_4b_counts = c(32,
24, 0, 70, 1, 0), sam30_4c_counts = c(13, 8, 0, 38, 2, 1),
sam30_5a_counts = c(22, 14, 0, 104, 2, 0), sam30_5b_counts = c(37,
49, 2, 88, 1, 0), sam30_5c_counts = c(37, 84, 1, 106, 0,
1), sam30_6a_counts = c(74, 58, 3, 110, 2, 1), sam30_6b_counts = c(68,
183, 3, 150, 2, 1), sam30_6c_counts = c(38, 86, 1, 161, 1,
0), sam30_7a_counts = c(21, 27, 0, 93, 2, 0), sam30_7b_counts = c(27,
20, 0, 89, 0, 1), sam30_7c_counts = c(24L, 23L, 0L, 91L,
1L, 0L), sam30_1a_counts = c(5, 7, 0, 16, 0, 0), sam30_1b_counts = c(38,
35, 3, 102, 0, 0), sam30_1c_counts = c(55, 26, 0, 136, 2,
0), sam30_2a_counts = c(20, 6, 0, 65, 2, 0), sam30_2b_counts = c(10,
5, 0, 43, 0, 0), sam30_2c_counts = c(86, 88, 7, 167, 6, 0
), sam30_3a_counts = c(39, 19, 1, 132, 3, 0), sam30_3b_counts = c(30,
31, 0, 113, 2, 0), sam30_3c_counts = c(59, 35, 0, 104, 3,
0), sam30_8a_counts = c(37, 22, 0, 133, 1, 0), sam30_8b_counts = c(28,
20, 0, 150, 2, 0), sam30_8c_counts = c(13, 20, 1, 104, 0,
0)), row.names = c("Glyma.01G000100", "Glyma.01G000200",
"Glyma.01G000300", "Glyma.01G000400", "Glyma.01G000500", "Glyma.01G000600"
), class = "data.frame")
my condition dataframe cond
.
cond <- structure(list(cultivar = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("R", "S"), class = "factor"),
time = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("30d", "5d"), class = "factor"),
treatment = structure(c(3L, 3L, 3L, 1L, 1L, 1L, 4L, 4L, 4L,
2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L,
3L, 3L, 3L, 1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L, 3L, 3L, 3L,
1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L), .Label = c("Aph", "Ctr",
"Scn", "ScnAph"), class = "factor")), row.names = c(NA, -48L
), class = "data.frame")
I think that you should consider merging your
cultivar
andtreatment
variables:....and then using:
Best person to ask is, of course, Michael Love, who is more active on the Bioconductor forums. He'll respond pretty quickly there.
Thanks, Kevin. I will post this in bioconductor as well.
I think that the way I've stated will allow you to achieve numbers 1-4 in your list, but possibly not #5. Interested to see what Michael Love suggests.
If anyone else here has suggestions, please chip in.
@Kevin Blighe, So I got the following results after running your code. Now how do I get #1 to #4? Thanks again for your help.
I see that you got similar advice on Bioconductor. Would it not make more sense to set
5d
as the reference level fortime
? It looks like you have30d
as the reference level. If you switch that around, then it should be (I think):I would just confirm this with Michael Love
Thank you so much, Kevin. That's right, releveling it to ref=5 day makes more sense.
I see that it was recommended to combine all 3 together with paste. I had considered that but was not entirely sure of the validity in the context of a DESeq2 design formula.
Yes, that's right. However, after merging all three columns (as
TimeTreatmentCultivar
), I tried and got the following error below. How do I reduce it properly? Thank you for your help. my code:Error:
I thought that, in this situation, you could avoid the
reduced
step.Also, given the complex experimental set-up that you have, I also thought that, perhaps, you could perform everything separately for R and S, although that introduces some extra bias into the results.
So should I just do
dds = DESeq(dds)
instead ofdds = DESeq(dds, test="LRT", reduced= ~ TimeTreatmentCultivar)
? I also trieddds = DESeq(dds, test="LRT")
, but got this error below. Or should I just doreduce = ~1
?I think that you should just do
DESeq(dds)
. When you generate your results, you don't appear to use the LRT anywhere; instead, Wald is specified. As your aim is to just compare pairs of levels, you don't require the LRT functionality.