DESeq2 time-series analysis: Treat time variable as continuous or discrete?
1
1
Entering edit mode
7 weeks ago
dstratis4 ▴ 10

Hello,

I have two questions regarding an RNA-seq experiment with 10 time points and 20 replicates per time point.

So far I've run an LRT using the following models:

design = ~ timepoint

reduced = ~ 1

Throughout the experiment, RNA was extracted at days 3, 4, 15, 16, 44, 74, 75, 76, 88 and 104 (ie. the time between each RNA extraction point was not the same). However, when I ran the LRT I treated the time points as discrete (ie. BDC1, BDC2, HDT1, HDT2, HDT30, HDT60, R1, R2, R21 and R30).

Question 1: Do I run the LRT treating time as a discrete or continuous variable? Does this even make a difference?

Additionally, in the DESeq2 vignette FAQ (http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#indfilttheory) the answer to the question "I ran a likelihood ratio test, but results() only gives me one comparison" , it says the resulting LFCs obtained from LRT only show a single comparison among the potentially multiple LFCs which were tested in the LRT.

Question 2: Are the resulting MA plots based on the single LFCs shown in the results() table or will it show every LFC tested using LRT? Are MA plots relevant to visualize the DE results for a time-series experiment?

Thanks in advance

RNA-seq DESeq2 R • 377 views
ADD COMMENT
0
Entering edit mode
7 weeks ago

Question 1: Timepoint as a factor is as a continuous co-variate

Basically, you can do either, and there are pros and cons to both approaches. If you treat it as a factor, then the question the LRT is asking is "Does variation in time account for variance in log read counts?" and is equivalent to doing an anova-like test. If you treat time as a continuous co-variate, you are basically asking the questions "Is there a signficiant linear correlation between time and log read counts?".

My instinct, though I don't know if anyone has demonstrated this specifically for RNAseq analysis, is that treating time as a linear covariate is more powerful, but only if the relationship between time and log read counts is linear, and might be significantly less powerful if the relationship is non-linear, and may entirely fail to detect some types of relationships (particulalry those with increased or decreated expression at intermediate time points and then a return to starting levels later). Treating time as a factor in contrast will allow the detection of any shape of profile, but maybe less powerful in the monotonic case, and will definately be less powerful in the linear case. Using time as continuous also gives you a single, easy to interpret coefficient (not quite an LFC, but the same idea), which the anova-like test doesn't.

It may be possible to find a transformation to make your expression~time relationship. This is the approach followed in the timecourse section of the edgeR manual (page 106) or the limma manual (page 49). Although this is decribed in the edgeR/limma manuals, I'm not aware of any reason this shouldn't work in DESeq2. Not that if you use a spline fit, then once again your coefficients don't have a meaning, but if you use a model with straight forward parameters (like a cubic polynomial, or, say, a sigmoid), then there will be a way to interpret them (although perhaps not a striaght forward one).

Question 2: LFCs and MA plots in LRT tests.

Basically, if you are doing an anova-like test, you are testing if more variance is accounted for by your full model, rather than your reduced model, rather than testing if a coefficient is signficiantly different from zero. Because there is not way to say which LFC is the important one in an anova, it doesn't really make much sense to talk about that. In fact, its perfectly possible for the LRT to be significant in a case where none of the coefficients are indevidually signficicantly different from zero. This also means that plots like MA plots and volcano plots don't make much sense for these sorts of tests.

ADD COMMENT
0
Entering edit mode

@i.sudbery, thank you so much for this comprehensive answer. I actually have one last question. For time-series analysis, do I use the shrunken or unshrunken DE results? I believe using the unshrunken results is correct because shrinkage is done on the LFCs, but LRT results are not based on LFCs. Is this correct?

ADD REPLY
0
Entering edit mode

My guess is that it might be possible to shrink if you treat time as continuous, but not if you treat it as a discrete factor.

ADD REPLY

Login before adding your answer.

Traffic: 1845 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6