Question: Multi-factor time series design model in DESeq2 analysis
0
gravatar for prppmet
3.7 years ago by
prppmet0
United States
prppmet0 wrote:

I have the following data: a). 3 strains (wild type & two mutants); b). 2 conditions: untreated & treated; c). 3 time points (min): 0, 15, 90. Each time point , except for 0 time point (base line control), has untreated & treated samples. There two replicates for each condition+time point.

    strain    treatment    time    replicate    id
S1.1    SC5314-wt    untr    0    r1    SC5314-wt_untr_0_r1
S2.6    SC5314-wt    untr    0    r2    SC5314-wt_untr_0_r2
S1.2    SC5314-wt    untr    15    r1    SC5314-wt_untr_15_r1
S2.7    SC5314-wt    untr    15    r2    SC5314-wt_untr_15_r2
S1.3    SC5314-wt    trt    15    r1    SC5314-wt_trt_15_r1
S2.8    SC5314-wt    trt    15    r2    SC5314-wt_trt_15_r2
S1.4    SC5314-wt    untr    90    r1    SC5314-wt_untr_90_r1
S2.9    SC5314-wt    untr    90    r2    SC5314-wt_untr_90_r2
S1.5    SC5314-wt    trt    90    r1    SC5314-wt_trt_90_r1
S2.10    SC5314-wt    trt    90    r2    SC5314-wt_trt_90_r2

IR1.11    irs4-mut    untr    0    r1    irs4-mut_untr_0_r1
IR2.16    irs4-mut    untr    0    r2    irs4-mut_untr_0_r2
IR1.12    irs4-mut    untr    15    r1    irs4-mut_untr_15_r1
IR2.17    irs4-mut    untr    15    r2    irs4-mut_untr_15_r2
IR1.13    irs4-mut    trt    15    r1    irs4-mut_trt_15_r1
IR2.18    irs4-mut    trt    15    r2    irs4-mut_trt_15_r2
IR1.14    irs4-mut    untr    90    r1    irs4-mut_untr_90_r1
IR2.19    irs4-mut    untr    90    r2    irs4-mut_untr_90_r2
IR1.15    irs4-mut    trt    90    r1    irs4-mut_trt_90_r1
IR2.20    irs4-mut    trt    90    r2    irs4-mut_trt_90_r2

IN1.21    inp51-mut    untr    0    r1    inp51-mut_untr_0_r1
IN2.26    inp51-mut    untr    0    r2    inp51-mut_untr_0_r2
IN1.22    inp51-mut    untr    15    r1    inp51-mut_untr_15_r1
IN2.27    inp51-mut    untr    15    r2    inp51-mut_untr_15_r2
IN1.23    inp51-mut    trt    15    r1    inp51-mut_trt_15_r1
IN2.28    inp51-mut    trt    15    r2    inp51-mut_trt_15_r2
IN1.24    inp51-mut    untr    90    r1    inp51-mut_untr_90_r1
IN2.29    inp51-mut    untr    90    r2    inp51-mut_untr_90_r2
IN1.25    inp51-mut    trt    90    r1    inp51-mut_trt_90_r1
IN2.30    inp51-mut    trt    90    r2    inp51-mut_trt_90_r2

Did the following: with DESeq2_1.6.3 

> sampledata$time <- factor(sampledata$time,levels = c("0", "15", "90"))
> sampledata$sample <- factor (paste0(sampledata$strain,"_", sampledata$treatment))
> sampledata$sample
 [1] SC5314-wt_untr SC5314-wt_untr SC5314-wt_untr SC5314-wt_untr SC5314-wt_trt  SC5314-wt_trt  SC5314-wt_untr SC5314-wt_untr SC5314-wt_trt  SC5314-wt_trt  irs4-mut_untr  irs4-mut_untr  irs4-mut_untr 
[14] irs4-mut_untr  irs4-mut_trt   irs4-mut_trt   irs4-mut_untr  irs4-mut_untr  irs4-mut_trt   irs4-mut_trt   inp51-mut_untr inp51-mut_untr inp51-mut_untr inp51-mut_untr inp51-mut_trt  inp51-mut_trt 
[27] inp51-mut_untr inp51-mut_untr inp51-mut_trt  inp51-mut_trt 
Levels: inp51-mut_trt inp51-mut_untr irs4-mut_trt irs4-mut_untr SC5314-wt_trt SC5314-wt_untr

> sampledata$time
 [1] 0  0  15 15 15 15 90 90 90 90 0  0  15 15 15 15 90 90 90 90 0  0  15 15 15 15 90 90 90 90
Levels: 0 15 90

#failed with the design:
> ddsCATS <- DESeqDataSetFromMatrix(countData = countsdata, colData = sampledata,design = ~ sample + time + sample:time)
Error in DESeqDataSet(se, design = design, ignoreRank) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  one or more variables or interaction terms in the design formula
  are linear combinations of the others and must be removed
 
# this worked
> ddsCATS <- DESeqDataSetFromMatrix(countData = countsdata, colData = sampledata,design = ~ sample + time)
#define factor level references

> ddsCATS$sample <- relevel (ddsCATS$sample, "SC5314-wt_untr")
> ddsCATS$time <- relevel (ddsCATS$time, "0")

#LRT test
> ddsCATS <- DESeq(ddsCATS, test="LRT", reduced = ~ time)
> resddsCATS <- results(ddsCATS)

> resultsNames(ddsCATS)
[1] "Intercept"            "sampleinp51.mut_untr" "sampleirs4.mut_trt"   "sampleirs4.mut_untr"  "sampleSC5314.wt_trt"  "sampleSC5314.wt_untr" "time_15_vs_0"         "time_90_vs_0"

> resddsCATS <- results(ddsCATS)
> resddsCATS
log2 fold change (MLE): time 90 vs 0 
LRT p-value: '~ sample + time' vs '~ time' 
DataFrame with 8051 rows and 6 columns
              baseMean log2FoldChange     lfcSE      stat    pvalue      padj
             <numeric>      <numeric> <numeric> <numeric> <numeric> <numeric>
C2_00990W_A 0.02556151     0.88717584  4.894719 0.3568103 0.9964362        NA
C2_00990W_B 0.02556151     0.88717584  4.894719 0.3568103 0.9964362        NA
C2_02640C_B 0.02976336     0.02591547  4.919825 0.4767291 0.9929506        NA
C3_04310C_A 0.03574418    -0.61652583  4.918296 0.5500559 0.9901756        NA
C3_06300W_B 0.03094594     0.13771721  4.919847 0.4435255 0.9940455        NA
...                ...            ...       ...       ...       ...       ...
C2_01000W_B   23247.92     -0.4226896 0.1487500  2.368988 0.7960833 0.9354503
C2_03270W_A   32929.48     -0.4830989 0.2028736  2.764650 0.7362141 0.9142012
C2_03270W_B   37671.48     -0.4933234 0.1991737  5.960478 0.3100836 0.6393959
C5_05050W_A   60685.44     -0.4559260 0.1654281  3.772222 0.5826539 0.8407806
C5_05050W_B   80734.06     -0.4962704 0.1552556  4.574513 0.4699761 0.7663622


Questions: 1). I would like to compare the effect of each condition at each time point both within and between three strains. what should be the full and reduced design terms for LRT test? 2). how do I setup contrasts to compare conditions at different time points both within and across the strains? 3). Lastly, what happened to sampleinp51.mut_trt as resultsNames(ddsCATS) does not list it?

Thanks.

rna-seq • 1.7k views
ADD COMMENTlink modified 3.7 years ago by Devon Ryan88k • written 3.7 years ago by prppmet0
0
gravatar for Devon Ryan
3.7 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:
  1. Don't use an LRT, use a wald test. Further, it's simplest to just reparamaterize things so you just have groups (e.g., SC5314-wt_untr_15 and SC5314-wt_trt_15) and just specify a different contrast for each of the comparisons you want to make. A GLM style design could still be used, but figuring out the contrasts becomes a pain then.
  2. If you use groups then this is significantly simpler. Only use a model like the one you did if you really want to test that model (e.g., see the effect of treatment across time).
  3. It's the base level of the comparison. By default, all of the coefficients will be versus it.
ADD COMMENTlink written 3.7 years ago by Devon Ryan88k

Thanks for the suggestions on the Wald test being the more appropriate choice for analysis of this data. I will follow your advice. I am still not clear on how sampleinp51.mut_trt was picked as the refernece level for sample factors despite my explicit selection of SC5314-wt_untr as the reference [ddsCATS$sample <- relevel (ddsCATS$sample, "SC5314-wt_untr")]. If I recall from DESeq2 manual, the last level of the factor is selected by default if no relevel is declared by the user. Here I selected two relevels separately for the two factors considered in the analysis, sample & time, in that order. It seems the last declaration in the sequence (ddsCATS$time <- relevel (ddsCATS$time, "0") is considered but not the sample reference. Do I understand this correctly? If not, how do I select SC5314-wt_untr as one of the two reference levels? On the other hand, if I combine all factors into a compound one as you suggested for Wald test, I only need to select one - SC5314-wt_untr_0. I would still like to know why the two relevel selection did not work for me. I appreciate the response.

ADD REPLYlink written 3.7 years ago by prppmet0

I don't know why the relevel didn't work, you can always just relevel sampledata$sample before making ddsCATS.

ADD REPLYlink written 3.7 years ago by Devon Ryan88k

The DESeqResults object has log2fold changes for only the 1st & last levels of the factor. For the rest, I can use contrast argument of the results function to generate that data. However I would like to know, since I have 15 different levels & want to generate a diagonal (15 X 15) log2-foldchange matrix, if there is an automated option to do just that within R. Any pointers will be appreciated.

ADD REPLYlink written 3.6 years ago by prppmet0

There's not a single command that will do that (that I know of, at least), but you could script it in a loop.

ADD REPLYlink written 3.6 years ago by Devon Ryan88k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1177 users visited in the last hour