Question

Using DESeq2 for time-series analysis

0

Entering edit mode

3.8 years ago

frymor2 ▴ 10

I was wondering if I understood the DESeq2 time-series vignette correctly. My data set contains 5 time-points without control samples as such

> metadata
    sample  time
TP1.1   TP1.1   1
TP1.2   TP1.2   1
TP2.1   TP2.1   2
TP2.2   TP2.2   2
TP3.1   TP3.1   3
TP3.2   TP3.2   3
TP4.1   TP4.1   4
TP4.2   TP4.2   4
TP5.1   TP5.1   5
TP5.2   TP5.2   5

So I basically have only the samples in duplicates for each time point and the time as a factor. I would like to identify genes with significant changes of expression over multiple time points.

For that I use the LRT test and the following model

dds <- DESeqDataSetFromMatrix(countData = counts,
           colData = metadata,
           design= ~sample + time)
dds <- DESeq(dds, test="LRT", reduced=~time)

Would this give me genes diff. regulated genes over all time-points?

If I want to compare all time-points in a pairwise manner - I should use the Wald test and the contrast or name parameters in the results() function?

thanks

ADD COMMENT • link updated 3.8 years ago by swbarnes2 14k • written 3.8 years ago by frymor2 ▴ 10

score 3 · Answer 1 · 2020-07-16

3

Entering edit mode

3.8 years ago

Asaf 10k

I wonder how this even works since sample is unique for each row so the model matrix ~sample + time is singular. If you are sampling from the same samples in different time points you should change sample to not include the time point. Also, make sure your time is a factor, if indeed that is what you want. You can also treat time as linear but then you'll have to make sure you're expecting a linear relationship between your time points values and the expression levels.

ADD COMMENT • link 3.8 years ago by Asaf 10k

0

Entering edit mode

Thanks for the comment. It is a misunderstanding. I meant to write that this is how I would like to analyze it, but wasn't sure if it would work at all, hence my question. I don't get you point though. Aren't the samples (names) always unique?

How should I than create the full or reduce model with this data set? I don't have control samples for each timepoint. I would appreciate the input.

ADD REPLY • link 3.8 years ago by frymor2 ▴ 10

1

Entering edit mode

You should have a dataset like this:

> metadata
    tissue  time
TP1.1   S1   1
TP1.2   S2   1
TP2.1   S1   2
TP2.2   S2   2
TP3.1   S1   3
TP3.2   S2   3
TP4.1   S1   4
TP4.2   S2   4
TP5.1   S1   5
TP5.2   S2   5

And your formula should be ~ tissue + time whereas the reduced should be ~ tissue. That is if the same tissue is sampled five time, if it's another tissue in each time point then ~ time vs ~0

ADD REPLY • link 3.8 years ago by Asaf 10k

0

Entering edit mode

thanks a lot for the reply. But I think there is a mistake in the last comment. when running dds <- DESeq(dds, test = "LRT", reduced = ~0) this I get the following error

...
fitting model and testing
Error in h(simpleError(msg, call)) : 
   error in evaluating the argument 'x' in selecting a method for function 't': 'a' (1 x 0) must be square

When trying to fir the model. When I change it to ~1 though it works. What would be the difference between ~1 and ~0?

thanks

ADD REPLY • link 3.7 years ago by frymor2 ▴ 10

0

Entering edit mode

Yes, of course, it's ~1

ADD REPLY • link 3.7 years ago by Asaf 10k

score 2 · Answer 2 · 2020-07-16

You are right in using LRT to find DE genes, in the results ignore the log2FC column and use a stringent adjusted pvalue cutoff to get a reasonable number of genes for downstream analysis. However, as Asaf has said your design seems to be wrong, shouldn't you be comparing ~time with ~1. There is no need to use sample in the design.

Also, you should try spline based approaches which are meant for time series experiments. These are implemented in impulseDE2 and splineTimeR. Splines can be implemented in DESeq2 and limma as well, supplementary material of impulseDE2 paper has code for using splines in DESeq2.

score 2 · Answer 3 · 2020-07-16

2

Entering edit mode

3.8 years ago

swbarnes2 14k

Drop that "sample" column. It's pointless.

If you want to look for genes that change linearly over the time points, just make "time" your design, and the name of your contrast too. (Make sure that time is not being imported as a factor)

ADD COMMENT • link 3.8 years ago by swbarnes2 14k

0

Entering edit mode

Thanks, Do you mean like that

dds <- DESeqDataSetFromMatrix(countData = counts,
           colData = metadata,
           design= ~time)
 dds <- DESeq(dds, test="LRT", reduced=~1)

ADD REPLY • link 3.8 years ago by frymor2 ▴ 10

0

Entering edit mode

No, that is not how you determine genes which follow a linear trend. You just do

dds <- DESeq(dds, design = ~ time)

res <- results(dds, name = "time")

This does not return a fold change, but it returns the slope of the regression line

ADD REPLY • link 3.8 years ago by swbarnes2 14k

0

Entering edit mode

I'm not sure I understand it here. What does it means slope of the regression line?

ADD REPLY • link 3.8 years ago by frymor2 ▴ 10

score 0 · Answer 4 · 2020-07-16

0

Entering edit mode

3.8 years ago

Shalu Jhanwar ▴ 520

For pairwise comparisons, contrast works well. You can define all the pairs in contrast for which you want to perform DEG.