I have the following set of samples I would like to use to assess differential gene expression using DESeq2:
sample_name molarity batch
BF-2M-1 2 BF
BF-2M-2 2 BF
BF-3M-1 3 BF
BF-3M-2 3 BF
BFL0-5A 0.5 BFL
BFL0-5B 0.5 BFL
BFL0-5X 0.5 BFL
BFL2A 2 BFL
BFL2B 2 BFL
BFL2C 2 BFL
BFL4B 4 BFL
BFL4C 4 BFL
BFL4X 4 BFL
BrineFly-1M 1 BrineFly
BrineFly-200mM 0.2 BrineFly
BrineFly-3M 3 BrineFly
Based on a PCA from variance-stabilizing-transformed counts, the different batches of experiments have a strong effect and need to be incorporated into the design formula, but my main question is how to incorporate the gradient of treatment levels (molarity) from 0.2M up to 4M into the design formula. I would like to see if molarity has an effect on DGE.
My current design formula is ~ batch + molarity
, but I'm not sure if this is doing what I want it to be doing and how to interpret the results. Any advice is appreciated.
In this setup, how will DESeq2 handle genes that may not be very differentially expressed between low and intermediate factor levels but then have a large difference between intermediate and high? Is it robust to these kind of non-linear trends across a gradient of treatment levels?
If you don't expect linearity, you should do an LRT test to look for any kind of differences between any concentrations. Then you can look at those changes and see how they correlate to concentration.