Entering edit mode
8.7 years ago
prppmet
•
0
I have the following data: a). 3 strains (wild type & two mutants); b). 2 conditions: untreated & treated; c). 3 time points (min): 0, 15, 90. Each time point , except for 0 time point (base line control), has untreated & treated samples. There two replicates for each condition+time point.
strain treatment time replicate id
S1.1 SC5314-wt untr 0 r1 SC5314-wt_untr_0_r1
S2.6 SC5314-wt untr 0 r2 SC5314-wt_untr_0_r2
S1.2 SC5314-wt untr 15 r1 SC5314-wt_untr_15_r1
S2.7 SC5314-wt untr 15 r2 SC5314-wt_untr_15_r2
S1.3 SC5314-wt trt 15 r1 SC5314-wt_trt_15_r1
S2.8 SC5314-wt trt 15 r2 SC5314-wt_trt_15_r2
S1.4 SC5314-wt untr 90 r1 SC5314-wt_untr_90_r1
S2.9 SC5314-wt untr 90 r2 SC5314-wt_untr_90_r2
S1.5 SC5314-wt trt 90 r1 SC5314-wt_trt_90_r1
S2.10 SC5314-wt trt 90 r2 SC5314-wt_trt_90_r2
IR1.11 irs4-mut untr 0 r1 irs4-mut_untr_0_r1
IR2.16 irs4-mut untr 0 r2 irs4-mut_untr_0_r2
IR1.12 irs4-mut untr 15 r1 irs4-mut_untr_15_r1
IR2.17 irs4-mut untr 15 r2 irs4-mut_untr_15_r2
IR1.13 irs4-mut trt 15 r1 irs4-mut_trt_15_r1
IR2.18 irs4-mut trt 15 r2 irs4-mut_trt_15_r2
IR1.14 irs4-mut untr 90 r1 irs4-mut_untr_90_r1
IR2.19 irs4-mut untr 90 r2 irs4-mut_untr_90_r2
IR1.15 irs4-mut trt 90 r1 irs4-mut_trt_90_r1
IR2.20 irs4-mut trt 90 r2 irs4-mut_trt_90_r2
IN1.21 inp51-mut untr 0 r1 inp51-mut_untr_0_r1
IN2.26 inp51-mut untr 0 r2 inp51-mut_untr_0_r2
IN1.22 inp51-mut untr 15 r1 inp51-mut_untr_15_r1
IN2.27 inp51-mut untr 15 r2 inp51-mut_untr_15_r2
IN1.23 inp51-mut trt 15 r1 inp51-mut_trt_15_r1
IN2.28 inp51-mut trt 15 r2 inp51-mut_trt_15_r2
IN1.24 inp51-mut untr 90 r1 inp51-mut_untr_90_r1
IN2.29 inp51-mut untr 90 r2 inp51-mut_untr_90_r2
IN1.25 inp51-mut trt 90 r1 inp51-mut_trt_90_r1
IN2.30 inp51-mut trt 90 r2 inp51-mut_trt_90_r2
Did the following with DESeq2_1.6.3
:
> sampledata$time <- factor(sampledata$time,levels = c("0", "15", "90"))
> sampledata$sample <- factor (paste0(sampledata$strain,"_", sampledata$treatment))
> sampledata$sample
[1] SC5314-wt_untr SC5314-wt_untr SC5314-wt_untr SC5314-wt_untr SC5314-wt_trt SC5314-wt_trt SC5314-wt_untr SC5314-wt_untr SC5314-wt_trt SC5314-wt_trt irs4-mut_untr irs4-mut_untr irs4-mut_untr
[14] irs4-mut_untr irs4-mut_trt irs4-mut_trt irs4-mut_untr irs4-mut_untr irs4-mut_trt irs4-mut_trt inp51-mut_untr inp51-mut_untr inp51-mut_untr inp51-mut_untr inp51-mut_trt inp51-mut_trt
[27] inp51-mut_untr inp51-mut_untr inp51-mut_trt inp51-mut_trt
Levels: inp51-mut_trt inp51-mut_untr irs4-mut_trt irs4-mut_untr SC5314-wt_trt SC5314-wt_untr
> sampledata$time
[1] 0 0 15 15 15 15 90 90 90 90 0 0 15 15 15 15 90 90 90 90 0 0 15 15 15 15 90 90 90 90
Levels: 0 15 90
#failed with the design:
> ddsCATS <- DESeqDataSetFromMatrix(countData = countsdata, colData = sampledata,design = ~ sample + time + sample:time)
Error in DESeqDataSet(se, design = design, ignoreRank) :
the model matrix is not full rank, so the model cannot be fit as specified.
one or more variables or interaction terms in the design formula
are linear combinations of the others and must be removed
# this worked
> ddsCATS <- DESeqDataSetFromMatrix(countData = countsdata, colData = sampledata,design = ~ sample + time)
#define factor level references
> ddsCATS$sample <- relevel (ddsCATS$sample, "SC5314-wt_untr")
> ddsCATS$time <- relevel (ddsCATS$time, "0")
#LRT test
> ddsCATS <- DESeq(ddsCATS, test="LRT", reduced = ~ time)
> resddsCATS <- results(ddsCATS)
> resultsNames(ddsCATS)
[1] "Intercept" "sampleinp51.mut_untr" "sampleirs4.mut_trt" "sampleirs4.mut_untr" "sampleSC5314.wt_trt" "sampleSC5314.wt_untr" "time_15_vs_0" "time_90_vs_0"
> resddsCATS <- results(ddsCATS)
> resddsCATS
log2 fold change (MLE): time 90 vs 0
LRT p-value: '~ sample + time' vs '~ time'
DataFrame with 8051 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
C2_00990W_A 0.02556151 0.88717584 4.894719 0.3568103 0.9964362 NA
C2_00990W_B 0.02556151 0.88717584 4.894719 0.3568103 0.9964362 NA
C2_02640C_B 0.02976336 0.02591547 4.919825 0.4767291 0.9929506 NA
C3_04310C_A 0.03574418 -0.61652583 4.918296 0.5500559 0.9901756 NA
C3_06300W_B 0.03094594 0.13771721 4.919847 0.4435255 0.9940455 NA
... ... ... ... ... ... ...
C2_01000W_B 23247.92 -0.4226896 0.1487500 2.368988 0.7960833 0.9354503
C2_03270W_A 32929.48 -0.4830989 0.2028736 2.764650 0.7362141 0.9142012
C2_03270W_B 37671.48 -0.4933234 0.1991737 5.960478 0.3100836 0.6393959
C5_05050W_A 60685.44 -0.4559260 0.1654281 3.772222 0.5826539 0.8407806
C5_05050W_B 80734.06 -0.4962704 0.1552556 4.574513 0.4699761 0.7663622
Questions:
- I would like to compare the effect of each condition at each time point both within and between three strains. What should be the full and reduced design terms for LRT test?
- How do I setup contrasts to compare conditions at different time points both within and across the strains?
- Lastly, what happened to
sampleinp51.mut_trt
asresultsNames(ddsCATS)
does not list it?
Thanks
Thanks for the suggestions on the Wald test being the more appropriate choice for analysis of this data. I will follow your advice. I am still not clear on how
sampleinp51.mut_trt
was picked as the reference level for sample factors despite my explicit selection ofSC5314-wt_untr
as the reference [ddsCATS$sample <- relevel (ddsCATS$sample, "SC5314-wt_untr")
]. If I recall from DESeq2 manual, the last level of the factor is selected by default if no relevel is declared by the user. Here I selected two relevels separately for the two factors considered in the analysis, sample & time, in that order. It seems the last declaration in the sequence (ddsCATS$time <- relevel (ddsCATS$time, "0")
) is considered but not the sample reference. Do I understand this correctly? If not, how do I selectSC5314-wt_untr
as one of the two reference levels? On the other hand, if I combine all factors into a compound one as you suggested for Wald test, I only need to select one -SC5314-wt_untr_0
. I would still like to know why the two relevel selection did not work for me. I appreciate the response.I don't know why the relevel didn't work, you can always just relevel
sampledata$sample
before makingddsCATS
.The DESeqResults object has log2fold changes for only the 1st & last levels of the factor. For the rest, I can use contrast argument of the results function to generate that data. However I would like to know, since I have 15 different levels & want to generate a diagonal (15 X 15) log2-foldchange matrix, if there is an automated option to do just that within R. Any pointers will be appreciated.
There's not a single command that will do that (that I know of, at least), but you could script it in a loop.