I am doing some RNA-seq analysis with DESeq2. Time course 0-48 h with one treatment of the cells. 0h is not treated and no replicates do exist.
I want to find out whether there is a linear response between time and gene expression (where time is considered to be a continuous covariate).
This is what I did so far:
>ddsfm <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ time) > head(assay(ddsfm)) h0_1 h2_2 h4_3 h6_4 h8_5 h10_6 h12_7 h14_8 h24_9 h48_10 A1BG-AS1 17 12 19 47 28 4 12 36 31 114 A1CF 0 1 2 2 4 0 0 5 3 22 A2M 3 4 6 18 11 20 20 18 17 60 A2M-AS1 0 3 1 6 0 10 4 9 2 12 A2ML1 0 0 3 4 6 2 2 2 3 7 A2MP1 9 20 34 144 97 5 35 72 69 112 > coldata intercept time h0_1 1 h0 h2_2 1 h2 h4_3 1 h4 h6_4 1 h6 h8_5 1 h8 h10_6 1 h10 h12_7 1 h12 h14_8 1 h14 h24_9 1 h24 h48_10 1 h48 > dds=DESeq(ddsfm,test="LRT",reduced= ~ 1, fitType='local') > resultsNames(dds)  "Intercept" "time_h10_vs_h0" "time_h12_vs_h0" "time_h14_vs_h0"  "time_h2_vs_h0" "time_h24_vs_h0" "time_h4_vs_h0" "time_h48_vs_h0"  "time_h6_vs_h0" "time_h8_vs_h0"
As far as I understood, by calling
I get the genes that respond to time. Correct?
So I just want to know if this is fine so far. Any thoughts are greatly appreciated.
The results look like that.
> resOrdered <- res[order(res$pvalue),] > head(resOrdered) log2 fold change (MLE): Intercept LRT p-value: '~ time' vs '~ 1' DataFrame with 6 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> LZTR1 46.28472 2.895392 2.466350 15.41700 0.08010024 0.9998715 FLJ37453 128.43150 5.558342 2.031662 15.32917 0.08228201 0.9998715 HMBS 47.71543 2.310480 2.494872 15.25735 0.08410516 0.9998715
Well it seems like that spanish guy from here comparing each time point with deseq2 and their results had a similar question.
Also since my p-values are not really low it seems like time has no influence on the gene's expression.
Keep in mind that your model isn't looking for a linear affect of time on gene expression. If that's really what you want (it's what you wrote), then you need
timeas a numeric covariate rather than a factor.
Thanks for your response.
I was basically following the guy here DESEq2 warning while using GLM fitting
Now I have got:
Also my results seem different now:
We might get something usefull out of that after all. I'll continue tomorrow. Thanks so far. (-:
Yeah, it's really just a matter of what question you want to ask. Your original model was more flexible, but you really need replicates for it to be useful. With the model above, you're restricted to finding genes that change in a more or less linear fashion with time (e.g., if expression doubles between 0h and 2h, then it should then double again every two hours), but you don't really need much in the way of replicates.
BTW, the fold changes produced are now "per hour".
Fold changes "per hour" - nice to know.
Ryan Devon: "but you don't really need much in the way of replicates."
I suppose that means that I don't really need replicates here.
One could even assume there is an quadratic (time2) fold change. I did this like that:
So this time I would like to know what genes fit either to a linear time model or a squared time model (or to no time).
I defined my model like this:
How is it done properly?
I could just do two separate runs
design = ~ time reduced= ~1
design = ~ timeSquared reduced= ~1
But I am sure there is a better way.
So anyway I did two separate runs one with
~ Lineartime' vs '~ 1'and one with
~ Squaredtime' vs '~ 1'
Here one can see sufficient low pvalues but then looking at padj it tells me otherwise.
Can these first few genes considered to be differentially expressed?
I am not sure how to handle the high padj values in that context...
Yeah, I'd take the first few genes of the first result set as being significant. For the quadratic time model, apparently there's simply not much that can reasonably fits that model.