Question: Time-course RNA-seq analysis in Deseq2
0
gravatar for Lisa
8 weeks ago by
Lisa310
Ireland
Lisa310 wrote:

Hi,

I am looking to do a time-course analysis of my RNA-seq human, whole blood datasets. I have 30 samples, across 4 time-points. I have no replicates.

T0 - Week 0, initial treatment T1 - Week 2, two weeks into treatment T2 - Week 12, twelve weeks into treatment T3 - Week 26, end of treatment; “Cure”

I have done the pairwise comparisons, comparing each time-point to T3, our cure state, using Deseq2. Now I want to track the changes of gene expression over the full treatment time.
I have read the time-course Vinaigrette on Deseq2 but I am not sure I am understanding it right.

I have assigned two variables, timepoint and SampleID to account for the different times and samples.

(timepoint <- factor(c(rep("0",30),rep("2",30)rep("12",30)rep("26",30))))

(coldata <- data.frame(row.names=colnames(countdata), timepoint, SampleID))

ddsMat <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~SampleID + timepoint)

dds <- DESeq(ddsMat)
res <- results(dds)

However I don’t know if this is enough. I can’t see where the time element comes into play. Any advice on doing a Time- course analysis?

rna-seq R • 247 views
ADD COMMENTlink modified 8 weeks ago by Vitis1.7k • written 8 weeks ago by Lisa310
1
gravatar for Kevin Blighe
8 weeks ago by
Kevin Blighe33k
Republic of Ireland
Kevin Blighe33k wrote:

Hey Lisa,

You have already controlled for time via the inclusion of timepoint in the design formula. You have also done your pairwise comparisons, presumably like this:

res1 <- results(dds, contrast=c("timepoint", "T3", "T0"), independentFiltering=TRUE, alpha=0.05, pAdjustMethod="BH", parallel=FALSE)
res1 <- lfcShrink(dds, contrast=c("timepoint", "T3", "T0"), res=res1)

res2 <- results(dds, contrast=c("timepoint", "T3", "T1"), independentFiltering=TRUE, alpha=0.05, pAdjustMethod="BH", parallel=FALSE)
res2 <- lfcShrink(dds, contrast=c("timepoint", "T3", "T1"), res=res2)

If you want to do ANOVA between the timepoints, do:

dds <- DESeq(ddsMat, test="LRT", reduced=~SampleID + timepoint)
res <- results(dds)

----------------------------------------------

The remaining part that you're asking is not strictly covered in DESeq2. I would first transform the normalised counts via rlog() (set blind=FALSE) and then you have a few choices:

  • simply plot the distribution of your genes across each time-point as a bar, box-and-whisker, or line plot. You can then perform further tests on these
  • for each gene of interest, fit a separate model (e.g. via localised regression), a model in which your timepoint factor is ordered based on time. You can then simply compare models and obtain p-values that way. This would address the question: 'Does GeneX's path across the timepoint differ to that of GeneY?"

Kevin

ADD COMMENTlink written 8 weeks ago by Kevin Blighe33k

Hi Kevin,

I actually did the pairwise comparisons separately, using a featureCounts matrix of just the two time-points in each comparison. Is it better practice to do them together?

When you suggest an ANOVA between time-points do you mean for me to test the variance between the groups to test for differences at the time-points?

I will try the Rlog() to plot the distribution and I will have to read into your suggestion for each gene of interest as I am not sure how I understand how to go about this.

Thanks so much for all your help and advice,

Lisa

ADD REPLYlink written 8 weeks ago by Lisa310

I actually did the pairwise comparisons separately, using a featureCounts matrix of just the two time-points in each comparison. Is it better practice to do them together?

Not sure what you mean here(?) - the best way would be to do the pairwise comparisons as I have shown in my sample code.

The ANOVA, then, would find genes that are different between all time-points. This may not exactly be relevant, though!

ADD REPLYlink written 8 weeks ago by Kevin Blighe33k
1

Yeah, I am redoing it the way you have suggested now.

Sorry if I was unclear. I meant that I subsetted the data into t0.vs.t3, t1.vs.t3, and t2.vs.t3, ran feature counts on them all individually and then did the deseq2 on each subset. I see now that this was unnecessary.

Thanks for clarifying about the ANOVA. I think it would be worth having a look anyway.

Thanks again.

ADD REPLYlink written 8 weeks ago by Lisa310

Ah, I see! Yes, doing it that way (sub-setting, et cetera) would introduce bias into the end-results. While saying this, you may see the same genes coming up as being statistically significant. Only those genes on the fringes of statistical significance may differ.

ADD REPLYlink written 8 weeks ago by Kevin Blighe33k
0
gravatar for Vitis
8 weeks ago by
Vitis1.7k
New York
Vitis1.7k wrote:

Time course expression analysis could be much more involved than straight differential expression analyses. Sometimes simple MDS/PCA could reveal biological meanings from the data. And there are other time-series analysis inspired methods such as EBSeq-HMM.

https://academic.oup.com/bioinformatics/article/31/16/2614/320882

Here is a review of such methods:

https://www.sciencedirect.com/science/article/pii/S2001037015000392#bb0330

ADD COMMENTlink written 8 weeks ago by Vitis1.7k

Thank you for the links to the literature, I will have a read and see what else I can use to answer my questions. At the moment, the differential gene expression seems the most appropriate choice, as the ultimate ideal goal is to potential find biomarkers indicative of “disease” and “cure”. However, I appreciate that there are multiple ways to approach this.

ADD REPLYlink written 8 weeks ago by Lisa310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1445 users visited in the last hour