Question: DESeq2 differential expression analysis for time course experiment: course of significant genes very similiar between tested conditions
1
gravatar for cmgraef
26 days ago by
cmgraef10
cmgraef10 wrote:

Hi there,

I'm currently doing an RNAseq analysis of a time course experiment with six timepoints, two genotype conditions and three replicates each. After STAR alignment and transcript quantification I used the DESeq2 package for count normalization and differential expression anlaysis just as decribed in the vignette with the following code:

dds <- DESeqDataSetFromMatrix(countData=count_matrix, colData=coldata, design= ~ genotype + timepoint + genotype:timepoint)
dds <- DESeq(dds, test="LRT", reduced = ~ timepoint+genotype)
res <- results(dds)

This all works well. However, if I look at the most signficant genes, I get time course plots such as this:

time course plot of significant gene

I don't doubt that there is a significant difference between these two conditions but I don't think they are biologically interesting as I seem to picking up noise here. Would you agree with this opinion? Is there an option to use DESeq2 to find genes with a non-parallel course or do I have to use a different package for that?

Thank you very much for your help! If you need any further information, please let me know.

Best, Moritz

ADD COMMENTlink modified 26 days ago • written 26 days ago by cmgraef10
2

Your problem here is that what you are asking DESeq to do is fit a linear regression on expression vs time for each genotype and then ask if the slope of that line is different betweeen genotypes. But a linear regression is clearly unsuitable here - gene expression is not linearly dependent on time, but has some more complex relationship.

I think there are more comlpex ways to solve this problem, but my first instict would be to make timepoint a factor rather than a numeric predictor.

You will unforunately loose a lot of power doing this - you'll be fitting a parameter for each timepoint for each genotype (the difference, for that genotype, from the first timepoint), rather than just one parameter for each genotype (the slope), but it should avoid the problem you identifiy.

The alternative is to transform the problem into one that is a linear regression, but I'm a bit out of my depth there.

ADD REPLYlink written 26 days ago by i.sudbery5.0k

DESeq doesn't do linear regression, it treats time as factor as far as I know, you will get differential expression if at any time point there will be a significant deviation from expected value.

ADD REPLYlink written 26 days ago by Asaf6.1k

Are you sure you can't provide continuouspredictors to DESeq, you definitely can to edgeR.

ADD REPLYlink written 26 days ago by i.sudbery5.0k

See for example: https://support.bioconductor.org/p/97262/, which is a helpful post all round for this question.

ADD REPLYlink modified 26 days ago • written 26 days ago by i.sudbery5.0k

You're right, my bad. Didn't know DESeq can do that. In that case, indeed the OP should use time as factor or use a non-linear approach

ADD REPLYlink written 26 days ago by Asaf6.1k

Thank you both for your quick answers! Sorry for not being specific enough, I actually did use timepoint as a factor here and still got these results. Any other ideas or experiences? I'm currently also using the maSigPro and splineTimeR package to do an analysis with timepoint as a continuous variable but first results also don't look too promising.

ADD REPLYlink written 25 days ago by cmgraef10
1

Yes, it doesn't seem right. Maybe asking in the parallel bioconductor forum would be a better option (and delete this post if no one has answered it)

ADD REPLYlink written 26 days ago by Asaf6.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1276 users visited in the last hour