I'm working with Prof. Love's kallisto/sleuth and DESeq2 libraries, on my PhD, where I run an RNA-seq timecourse experiment comprising 6-8 samples per timepoint over four (unevenly spaced) developmental timepoints.
My current model is
~ batch + CutRIN + ns( ImputedAgeDays) - that is, if I understand correctly, it corrects for batch effects and accounts for low/medium/high RIN (cut so as to avoid using a numeric factor, which DESeq2 models in a specific way; my apologies, Prof. Love, I know you discourage the inclusion of RIN in DESeq2), then fits a spline to the (estimated) age of the sample in days (which means the fit is smoothed better than processing counts without a spline would be).
Running an LRT, with
reduced= ~batch + CutRin, (again, if I understand correctly) I get a list of genes whose expression over time fits the spline better than noise (i.e. change over time), with sleuth outputting more than DESeq2.
sleuth log-transforms data before fitting, and outputs beta, an effect size, which is not the same as a fold-change or log fold change. Therefore, sleuth effect sizes cannot directly be compared to DESeq2 output and its log2FCs.
transformation_function=log(x+0.5, 2), sleuth can output log2 fold changes. However, there is a warning,
be sure you know what you're doing before you change this.
What is that warning about?
There must be a good reason, but I've not found an explanation anywhere, and whilst I can assume there are assumptions e.g. in the way sleuth models the data and bootstraps, I am not familiar enough with the matter to see which might be violated. What dangers are there if I switch sleuth to log2FCs, and would I be better off just ranking the results by beta/log2FC to compare sleuth results to DESeq2?