DESeq2 model design
2
0
Entering edit mode
5.5 years ago

I have a data , like this

    Genotype  condition             time
A                 uninfected        0day
A                 uninfected        0 day
A                 uninfected        0 day
A                 mock                2day
A                 mock                2day
A                 mock                2 day
A                 infetced            2day
A                 infected            2 day
A                 infected            2 day
A                 mock                4day
A                 mock                4 day
A                mock                4day
A                 infected            4day
A                 infected            4day
A                 infected            4 day
B                 uninfected         0 day
B                 uninfected         0 day
B                 uninfected         0 day
B                 mock                 2day
B                mock                 2day
B                 mock                 2day
B                 infected              2day
B                infected               2day
B               infected               2day
B                 mock                  4day
B                 mock                  4day
B                 mock                  4day
B                 infected             4day
B                infected              4day
B               infected               4day


I have to apply a model such that i can compare the effect of treatment over time of two different genotypes. I have used the design formula Genotype+condition+time+condition:time+Genotype:time and then LRT test using Genotype+condition+time Due to only one uninfected condition in each genotype it is saying
Model matrix is not full rank
1.Which model can I use for this type of data for time series comparision.
2.Can i add uninfected again before 4 day (as this is the same). Will it be statistically correct ?

RNA-Seq • 2.4k views
0
Entering edit mode

You have the same problem as discussed at Error in differential analysis for samples with different time points. In short: no biological replicates.

0
Entering edit mode

The uninfected samples aren't useful, you can remove them. Also, as you have no replicates you will either have to ignore the effect of Genotype, condition, or time. A better solution would be to sequence replicates (ideally 6 of each condition, but at the minimum 3 of each).

0
Entering edit mode

Thank you for the reply Sorry, I forgot to write the point that i have 3 biological replicates for each . I want to compare uninfected with mock and with infected according to my work plan.

0
Entering edit mode

Why aren't the uninfected samples useful? They provide a T_0 baseline for the rest of the experiment

0
Entering edit mode

yes , we are also taking it as the baseline to compare with mock and infected

0
Entering edit mode

They have no corresponding samples with which to compare, so it's unclear if any change is actually due to time or just "doing something" (i.e., mock treatment).

0
Entering edit mode

On the contrary, if they didn't have the T_0 sample, their mock-infection placebo could have profound time-dependent effects but they'd have no baseline sample against which to compare those effects and wouldn't be able to identify them. This is a well designed experiment and I really don't understand the criticism

0
Entering edit mode

Their baseline for comparison is currently day 2, which it is regardless of whether the uninfected samples are present or not. One can't meaningfully use day 0 as a baseline because it can't be distinguished from the lack of treatment. There was likely a good biological reason for this, but as we're naive to that what I mentioned is the most we can say. This also corresponds to Carlo's answer, where he correctly notes that mock and uninfected can't be compared if we think there's a change simply due to time at day 2.

0
Entering edit mode

On the contrary, if they didn't have the T_0 sample, their mock-infection placebo could have profound time-dependent effects but they'd have no baseline sample against which to compare those effects and wouldn't be able to identify them This is exactly what we want to know and we have two genotypes which are known to have variation in the levels of genes even in the uninfected level (without any mock and infected). So, considering this we took the uninfected level

1
Entering edit mode
5.5 years ago

Hi,

First of all, if think that there are typos in your conditions as sometimes you write (un)infetced (instead of infected). Your design is not full rank because all "uninfected" are on "day0". A possible solution (that may or may not be good depending on your experimental settings) could be to merge the mock and uninfected levels :

Genotype  condition            time
A         uninfected           0
A         uninfected           2
A         infected             2
A         uninfected           4
A         infected             4
B         uninfected           0
B         uninfected           2
B         infected             2
B         uninfected           4
B         infected             4

0
Entering edit mode

Thank you for the quick reply We need to actually compare uninfected with mock and infected, so merging will be a little difficult option

0
Entering edit mode

Ok, but as I said, in your design, sample uninfected and sample on Day0 are basically the same, that is why u get the error. As long as you include the interaction between "condition" and "time", the error will be there.

In my proposed workaround, even if you can't compare uninfected with mock directly, u can compare uninfected samples on day0 with uninfected samples on day2/day4, which is basically the same thing.

0
Entering edit mode
5.5 years ago
russhh 5.7k

I'd suggest you make a column for each of mock_time and infected_time; then fit ~ -1 + Genotype + mock_time + infected_time. This only fits linearly with time (but it can be extended to include quadratic time if needed)

To do that, you just need to encode the design differently.

treatments <- data.frame(
# - Assuming the time column in the posted dataframe is actually numeric,
# rather than the strings that you've copied into your question...
mock_time = with(please_name_your_data, time * condition == "mock"),
infected_time = with(please_name_your_data, time * condition == "infected")
)

design <- model.matrix(~ -1 + genotype + mock_time + infected_time)

# etcetera


I'm not sure how to test the value of including the Genotype/Time interaction (personally, I wouldn't have considered it when coding the design).