Question: DESeq2 model design
0
gravatar for hamsasekar02
4 months ago by
hamsasekar020 wrote:

I have a data , like this

    Genotype  condition             time 
   A                 uninfected        0day
   A                 uninfected        0 day
   A                 uninfected        0 day
   A                 mock                2day
   A                 mock                2day 
   A                 mock                2 day
   A                 infetced            2day
   A                 infected            2 day 
   A                 infected            2 day
   A                 mock                4day 
   A                 mock                4 day 
   A                mock                4day
   A                 infected            4day
   A                 infected            4day 
   A                 infected            4 day
   B                 uninfected         0 day
   B                 uninfected         0 day
   B                 uninfected         0 day
   B                 mock                 2day
   B                mock                 2day
   B                 mock                 2day
   B                 infected              2day
   B                infected               2day
   B               infected               2day
   B                 mock                  4day
   B                 mock                  4day
   B                 mock                  4day
   B                 infected             4day
   B                infected              4day
   B               infected               4day

I have to apply a model such that i can compare the effect of treatment over time of two different genotypes. I have used the design formula Genotype+condition+time+condition:time+Genotype:time and then LRT test using Genotype+condition+time Due to only one uninfected condition in each genotype it is saying
Model matrix is not full rank
1.Which model can I use for this type of data for time series comparision.
2.Can i add uninfected again before 4 day (as this is the same). Will it be statistically correct ?

rna-seq • 337 views
ADD COMMENTlink modified 4 months ago by russhh4.6k • written 4 months ago by hamsasekar020

You have the same problem as discussed at Error in differential analysis for samples with different time points. In short: no biological replicates.

ADD REPLYlink modified 4 months ago • written 4 months ago by h.mon27k

The uninfected samples aren't useful, you can remove them. Also, as you have no replicates you will either have to ignore the effect of Genotype, condition, or time. A better solution would be to sequence replicates (ideally 6 of each condition, but at the minimum 3 of each).

ADD REPLYlink written 4 months ago by Devon Ryan91k

Thank you for the reply Sorry, I forgot to write the point that i have 3 biological replicates for each . I want to compare uninfected with mock and with infected according to my work plan.

ADD REPLYlink modified 4 months ago • written 4 months ago by hamsasekar020

Why aren't the uninfected samples useful? They provide a T_0 baseline for the rest of the experiment

ADD REPLYlink written 4 months ago by russhh4.6k

yes , we are also taking it as the baseline to compare with mock and infected

ADD REPLYlink written 4 months ago by hamsasekar020

They have no corresponding samples with which to compare, so it's unclear if any change is actually due to time or just "doing something" (i.e., mock treatment).

ADD REPLYlink written 4 months ago by Devon Ryan91k

On the contrary, if they didn't have the T_0 sample, their mock-infection placebo could have profound time-dependent effects but they'd have no baseline sample against which to compare those effects and wouldn't be able to identify them. This is a well designed experiment and I really don't understand the criticism

ADD REPLYlink written 4 months ago by russhh4.6k

Their baseline for comparison is currently day 2, which it is regardless of whether the uninfected samples are present or not. One can't meaningfully use day 0 as a baseline because it can't be distinguished from the lack of treatment. There was likely a good biological reason for this, but as we're naive to that what I mentioned is the most we can say. This also corresponds to Carlo's answer, where he correctly notes that mock and uninfected can't be compared if we think there's a change simply due to time at day 2.

ADD REPLYlink written 4 months ago by Devon Ryan91k

On the contrary, if they didn't have the T_0 sample, their mock-infection placebo could have profound time-dependent effects but they'd have no baseline sample against which to compare those effects and wouldn't be able to identify them This is exactly what we want to know and we have two genotypes which are known to have variation in the levels of genes even in the uninfected level (without any mock and infected). So, considering this we took the uninfected level

ADD REPLYlink modified 4 months ago • written 4 months ago by hamsasekar020
1
gravatar for Carlo Yague
4 months ago by
Carlo Yague4.6k
Belgium
Carlo Yague4.6k wrote:

Hi,

First of all, if think that there are typos in your conditions as sometimes you write (un)infetced (instead of infected). Your design is not full rank because all "uninfected" are on "day0". A possible solution (that may or may not be good depending on your experimental settings) could be to merge the mock and uninfected levels :

Genotype  condition            time 
A         uninfected           0
A         uninfected           2
A         infected             2
A         uninfected           4 
A         infected             4
B         uninfected           0
B         uninfected           2
B         infected             2
B         uninfected           4
B         infected             4
ADD COMMENTlink written 4 months ago by Carlo Yague4.6k

Thank you for the quick reply We need to actually compare uninfected with mock and infected, so merging will be a little difficult option

ADD REPLYlink written 4 months ago by hamsasekar020

Ok, but as I said, in your design, sample uninfected and sample on Day0 are basically the same, that is why u get the error. As long as you include the interaction between "condition" and "time", the error will be there.

In my proposed workaround, even if you can't compare uninfected with mock directly, u can compare uninfected samples on day0 with uninfected samples on day2/day4, which is basically the same thing.

ADD REPLYlink modified 4 months ago • written 4 months ago by Carlo Yague4.6k
0
gravatar for russhh
4 months ago by
russhh4.6k
UK, U. Glasgow
russhh4.6k wrote:

I'd suggest you make a column for each of mock_time and infected_time; then fit ~ -1 + Genotype + mock_time + infected_time. This only fits linearly with time (but it can be extended to include quadratic time if needed)

To do that, you just need to encode the design differently.

treatments <- data.frame(
    genotype = please_name_your_data$Genotype,
    # - Assuming the `time` column in the posted dataframe is actually numeric,
    # rather than the strings that you've copied into your question...
    mock_time = with(please_name_your_data, time * condition == "mock"),
    infected_time = with(please_name_your_data, time * condition == "infected")
)

design <- model.matrix(~ -1 + genotype + mock_time + infected_time)

# etcetera

I'm not sure how to test the value of including the Genotype/Time interaction (personally, I wouldn't have considered it when coding the design).

ADD COMMENTlink written 4 months ago by russhh4.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2015 users visited in the last hour