Question: Using DESeq2 for time-series analysis
gravatar for frymor2
27 days ago by
frymor210 wrote:

I was wondering if I understood the DESeq2 time-series vignette correctly. My data set contains 5 time-points without control samples as such

> metadata
    sample  time
TP1.1   TP1.1   1
TP1.2   TP1.2   1
TP2.1   TP2.1   2
TP2.2   TP2.2   2
TP3.1   TP3.1   3
TP3.2   TP3.2   3
TP4.1   TP4.1   4
TP4.2   TP4.2   4
TP5.1   TP5.1   5
TP5.2   TP5.2   5

So I basically have only the samples in duplicates for each time point and the time as a factor. I would like to identify genes with significant changes of expression over multiple time points.

For that I use the LRT test and the following model

dds <- DESeqDataSetFromMatrix(countData = counts,
           colData = metadata,
           design= ~sample + time)
dds <- DESeq(dds, test="LRT", reduced=~time)

Would this give me genes diff. regulated genes over all time-points?

If I want to compare all time-points in a pairwise manner - I should use the Wald test and the contrast or name parameters in the results() function?


lrt deseq2 time-series • 188 views
ADD COMMENTlink modified 27 days ago by swbarnes28.2k • written 27 days ago by frymor210
gravatar for Asaf
27 days ago by
Asaf8.3k wrote:

I wonder how this even works since sample is unique for each row so the model matrix ~sample + time is singular. If you are sampling from the same samples in different time points you should change sample to not include the time point. Also, make sure your time is a factor, if indeed that is what you want. You can also treat time as linear but then you'll have to make sure you're expecting a linear relationship between your time points values and the expression levels.

ADD COMMENTlink written 27 days ago by Asaf8.3k

Thanks for the comment. It is a misunderstanding. I meant to write that this is how I would like to analyze it, but wasn't sure if it would work at all, hence my question. I don't get you point though. Aren't the samples (names) always unique?

How should I than create the full or reduce model with this data set? I don't have control samples for each timepoint. I would appreciate the input.

ADD REPLYlink written 23 days ago by frymor210

You should have a dataset like this:

> metadata
    tissue  time
TP1.1   S1   1
TP1.2   S2   1
TP2.1   S1   2
TP2.2   S2   2
TP3.1   S1   3
TP3.2   S2   3
TP4.1   S1   4
TP4.2   S2   4
TP5.1   S1   5
TP5.2   S2   5

And your formula should be ~ tissue + time whereas the reduced should be ~ tissue. That is if the same tissue is sampled five time, if it's another tissue in each time point then ~ time vs ~0

ADD REPLYlink written 23 days ago by Asaf8.3k

thanks a lot for the reply. But I think there is a mistake in the last comment. when running dds <- DESeq(dds, test = "LRT", reduced = ~0) this I get the following error

fitting model and testing
Error in h(simpleError(msg, call)) : 
   error in evaluating the argument 'x' in selecting a method for function 't': 'a' (1 x 0) must be square

When trying to fir the model. When I change it to ~1 though it works. What would be the difference between ~1 and ~0?


ADD REPLYlink modified 14 days ago • written 14 days ago by frymor210

Yes, of course, it's ~1

ADD REPLYlink written 14 days ago by Asaf8.3k
gravatar for ashish
27 days ago by
ashish480 wrote:

You are right in using LRT to find DE genes, in the results ignore the log2FC column and use a stringent adjusted pvalue cutoff to get a reasonable number of genes for downstream analysis. However, as Asaf has said your design seems to be wrong, shouldn't you be comparing ~time with ~1. There is no need to use sample in the design.

Also, you should try spline based approaches which are meant for time series experiments. These are implemented in impulseDE2 and splineTimeR. Splines can be implemented in DESeq2 and limma as well, supplementary material of impulseDE2 paper has code for using splines in DESeq2.

ADD COMMENTlink written 27 days ago by ashish480
gravatar for swbarnes2
27 days ago by
United States
swbarnes28.2k wrote:

Drop that "sample" column. It's pointless.

If you want to look for genes that change linearly over the time points, just make "time" your design, and the name of your contrast too. (Make sure that time is not being imported as a factor)

ADD COMMENTlink written 27 days ago by swbarnes28.2k

Thanks, Do you mean like that

dds <- DESeqDataSetFromMatrix(countData = counts,
           colData = metadata,
           design= ~time)
 dds <- DESeq(dds, test="LRT", reduced=~1)
ADD REPLYlink written 23 days ago by frymor210

No, that is not how you determine genes which follow a linear trend. You just do

dds <- DESeq(dds, design = ~ time)

res <- results(dds, name = "time")

This does not return a fold change, but it returns the slope of the regression line

ADD REPLYlink modified 23 days ago • written 23 days ago by swbarnes28.2k

I'm not sure I understand it here. What does it means slope of the regression line?

ADD REPLYlink written 20 days ago by frymor210
gravatar for Shalu Jhanwar
27 days ago by
Shalu Jhanwar350
Shalu Jhanwar350 wrote:

For pairwise comparisons, contrast works well. You can define all the pairs in contrast for which you want to perform DEG.

ADD COMMENTlink written 27 days ago by Shalu Jhanwar350

and for the overall comparison?

ADD REPLYlink written 27 days ago by frymor210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1659 users visited in the last hour