Entering edit mode
7.2 years ago
MAPK
★
2.1k
Hi All,
I need help to resolve this problem with deseq.
I have two cultivars (R and S), two time points (5d and 30d) and four treatments (Scn, Aph, ScnAph, Ctr). There are two sets of controls (Ctr) for each cultivar and I have a total of 48 samples.
I need to compare:
- Scn Vs Ctr within resistance within 5 day
- Scn Vs Aph within susceptible within 5 day
- Scn Vs Ctr within resistance within 30 day
- Scn Vs Aph within susceptible within 30 day
- R Vs S in whole dataset
Can someone please guide me through this. I have tried this model which looks like this, but I am not sure if this is the correct way of doing it. Thanks for your help in advance.
dds <- DESeqDataSetFromMatrix(countData=count.mat,colData=cond,design= ~cultivar+time+cultivar:time)
##you relevel 5d so you'll do "cultivarR.time30d"
dds$time <- relevel(dds$time, ref = "5d")
##you relevel S so you'll do "cultivarR.time5d"
dds$cultivar <- relevel(dds$cultivar, ref = "S")
dds = DESeq(dds, test="LRT", reduced= ~cultivar + time)
#5d vs 30d within resistance
##in case where dds$time <- relevel(dds$time, ref = "S")
resultsNames(dds)
# > resultsNames(dds)
# [1] "Intercept" "cultivar_R_vs_S" "time_5d_vs_30d" "cultivarR.time5d"
#5d vs 30d within resistance
tt<- results(dds, contrast=list(c("time_5d_vs_30d", "cultivarR.time5d")), test="Wald")
My data:
my count.mat looks like this:
count.mat <- structure(list(sam5_4a_counts = c(26, 28, 1, 137, 3, 0), sam5_4b_counts = c(13,
6, 0, 95, 3, 0), sam5_4c_counts = c(1, 7, 0, 41, 2, 0), sam5_5a_counts = c(28,
15, 0, 84, 0, 0), sam5_5b_counts = c(12, 29, 1, 97, 2, 1), sam5_5c_counts = c(10,
11, 0, 77, 2, 0), sam5_6a_counts = c(42, 24, 0, 139, 4, 0), sam5_6b_counts = c(29,
28, 1, 166, 1, 1), sam5_6c_counts = c(29, 46, 0, 112, 5, 3),
sam5_7a_counts = c(7, 7, 1, 65, 0, 0), sam5_7b_counts = c(37,
10, 0, 108, 4, 0), sam5_7c_counts = c(7, 4, 0, 47, 0, 0),
sam5_1a_counts = c(44, 56, 4, 107, 2, 0), sam5_1b_counts = c(13,
11, 3, 44, 1, 0), sam5_1c_counts = c(39, 55, 1, 166, 1, 0
), sam5_2a_counts = c(4, 8, 1, 75, 1, 0), sam5_2b_counts = c(126,
160, 10, 414, 5, 1), sam5_2c_counts = c(28, 37, 1, 209, 3,
1), sam5_3a_counts = c(38, 70, 5, 138, 3, 1), sam5_3b_counts = c(132,
218, 6, 390, 14, 5), sam5_3c_counts = c(10, 2, 0, 39, 0,
1), sam5_8a_counts = c(19, 37, 1, 140, 1, 1), sam5_8b_counts = c(5,
12, 0, 63, 1, 0), sam5_8c_counts = c(27, 14, 1, 90, 0, 0),
sam30_4a_counts = c(29, 31, 0, 68, 2, 0), sam30_4b_counts = c(32,
24, 0, 70, 1, 0), sam30_4c_counts = c(13, 8, 0, 38, 2, 1),
sam30_5a_counts = c(22, 14, 0, 104, 2, 0), sam30_5b_counts = c(37,
49, 2, 88, 1, 0), sam30_5c_counts = c(37, 84, 1, 106, 0,
1), sam30_6a_counts = c(74, 58, 3, 110, 2, 1), sam30_6b_counts = c(68,
183, 3, 150, 2, 1), sam30_6c_counts = c(38, 86, 1, 161, 1,
0), sam30_7a_counts = c(21, 27, 0, 93, 2, 0), sam30_7b_counts = c(27,
20, 0, 89, 0, 1), sam30_7c_counts = c(24L, 23L, 0L, 91L,
1L, 0L), sam30_1a_counts = c(5, 7, 0, 16, 0, 0), sam30_1b_counts = c(38,
35, 3, 102, 0, 0), sam30_1c_counts = c(55, 26, 0, 136, 2,
0), sam30_2a_counts = c(20, 6, 0, 65, 2, 0), sam30_2b_counts = c(10,
5, 0, 43, 0, 0), sam30_2c_counts = c(86, 88, 7, 167, 6, 0
), sam30_3a_counts = c(39, 19, 1, 132, 3, 0), sam30_3b_counts = c(30,
31, 0, 113, 2, 0), sam30_3c_counts = c(59, 35, 0, 104, 3,
0), sam30_8a_counts = c(37, 22, 0, 133, 1, 0), sam30_8b_counts = c(28,
20, 0, 150, 2, 0), sam30_8c_counts = c(13, 20, 1, 104, 0,
0)), row.names = c("Glyma.01G000100", "Glyma.01G000200",
"Glyma.01G000300", "Glyma.01G000400", "Glyma.01G000500", "Glyma.01G000600"
), class = "data.frame")
my condition dataframe cond.
cond <- structure(list(cultivar = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("R", "S"), class = "factor"),
time = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("30d", "5d"), class = "factor"),
treatment = structure(c(3L, 3L, 3L, 1L, 1L, 1L, 4L, 4L, 4L,
2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L,
3L, 3L, 3L, 1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L, 3L, 3L, 3L,
1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L), .Label = c("Aph", "Ctr",
"Scn", "ScnAph"), class = "factor")), row.names = c(NA, -48L
), class = "data.frame")
I think that you should consider merging your
cultivarandtreatmentvariables:....and then using:
Best person to ask is, of course, Michael Love, who is more active on the Bioconductor forums. He'll respond pretty quickly there.
Thanks, Kevin. I will post this in bioconductor as well.
I think that the way I've stated will allow you to achieve numbers 1-4 in your list, but possibly not #5. Interested to see what Michael Love suggests.
If anyone else here has suggestions, please chip in.
@Kevin Blighe, So I got the following results after running your code. Now how do I get #1 to #4? Thanks again for your help.
I see that you got similar advice on Bioconductor. Would it not make more sense to set
5das the reference level fortime? It looks like you have30das the reference level. If you switch that around, then it should be (I think):I would just confirm this with Michael Love
Thank you so much, Kevin. That's right, releveling it to ref=5 day makes more sense.
I see that it was recommended to combine all 3 together with paste. I had considered that but was not entirely sure of the validity in the context of a DESeq2 design formula.
Yes, that's right. However, after merging all three columns (as
TimeTreatmentCultivar), I tried and got the following error below. How do I reduce it properly? Thank you for your help. my code:Error:
I thought that, in this situation, you could avoid the
reducedstep.Also, given the complex experimental set-up that you have, I also thought that, perhaps, you could perform everything separately for R and S, although that introduces some extra bias into the results.
So should I just do
dds = DESeq(dds)instead ofdds = DESeq(dds, test="LRT", reduced= ~ TimeTreatmentCultivar)? I also trieddds = DESeq(dds, test="LRT"), but got this error below. Or should I just doreduce = ~1?I think that you should just do
DESeq(dds). When you generate your results, you don't appear to use the LRT anywhere; instead, Wald is specified. As your aim is to just compare pairs of levels, you don't require the LRT functionality.