Question 1: Timepoint as a factor is as a continuous co-variate

Question

DESeq2 time-series analysis: Treat time variable as continuous or discrete?

1

Entering edit mode

2.4 years ago

dstratis4 ▴ 10

Hello,

I have two questions regarding an RNA-seq experiment with 10 time points and 20 replicates per time point.

So far I've run an LRT using the following models:

design = ~ timepoint

reduced = ~ 1

Throughout the experiment, RNA was extracted at days 3, 4, 15, 16, 44, 74, 75, 76, 88 and 104 (ie. the time between each RNA extraction point was not the same). However, when I ran the LRT I treated the time points as discrete (ie. BDC1, BDC2, HDT1, HDT2, HDT30, HDT60, R1, R2, R21 and R30).

Question 1: Do I run the LRT treating time as a discrete or continuous variable? Does this even make a difference?

Additionally, in the DESeq2 vignette FAQ (http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#indfilttheory) the answer to the question "I ran a likelihood ratio test, but results() only gives me one comparison" , it says the resulting LFCs obtained from LRT only show a single comparison among the potentially multiple LFCs which were tested in the LRT.

Question 2: Are the resulting MA plots based on the single LFCs shown in the results() table or will it show every LFC tested using LRT? Are MA plots relevant to visualize the DE results for a time-series experiment?

Thanks in advance

RNA-seq DESeq2 R • 1.9k views

ADD COMMENT • link updated 2.4 years ago by i.sudbery 19k • written 2.4 years ago by dstratis4 ▴ 10

score 0 · Answer 1 · 2021-11-27

Question 1: Timepoint as a factor is as a continuous co-variate

Basically, you can do either, and there are pros and cons to both approaches. If you treat it as a factor, then the question the LRT is asking is "Does variation in time account for variance in log read counts?" and is equivalent to doing an anova-like test. If you treat time as a continuous co-variate, you are basically asking the questions "Is there a signficiant linear correlation between time and log read counts?".

My instinct, though I don't know if anyone has demonstrated this specifically for RNAseq analysis, is that treating time as a linear covariate is more powerful, but only if the relationship between time and log read counts is linear, and might be significantly less powerful if the relationship is non-linear, and may entirely fail to detect some types of relationships (particulalry those with increased or decreated expression at intermediate time points and then a return to starting levels later). Treating time as a factor in contrast will allow the detection of any shape of profile, but maybe less powerful in the monotonic case, and will definately be less powerful in the linear case. Using time as continuous also gives you a single, easy to interpret coefficient (not quite an LFC, but the same idea), which the anova-like test doesn't.

It may be possible to find a transformation to make your expression~time relationship. This is the approach followed in the timecourse section of the edgeR manual (page 106) or the limma manual (page 49). Although this is decribed in the edgeR/limma manuals, I'm not aware of any reason this shouldn't work in DESeq2. Not that if you use a spline fit, then once again your coefficients don't have a meaning, but if you use a model with straight forward parameters (like a cubic polynomial, or, say, a sigmoid), then there will be a way to interpret them (although perhaps not a striaght forward one).

Question 2: LFCs and MA plots in LRT tests.

Basically, if you are doing an anova-like test, you are testing if more variance is accounted for by your full model, rather than your reduced model, rather than testing if a coefficient is signficiantly different from zero. Because there is not way to say which LFC is the important one in an anova, it doesn't really make much sense to talk about that. In fact, its perfectly possible for the LRT to be significant in a case where none of the coefficients are indevidually signficicantly different from zero. This also means that plots like MA plots and volcano plots don't make much sense for these sorts of tests.