I'm working on differential genes/transcripts analysis in which I am interested in the effect of a treatment between patients, and their reaction across two age groups. I have a large number of patients in that study.
For now, I have counted reads over the transcriptome and performed classic clustered heatmap and PCA. They are showing the effect of the age is the most powefull effect over the experimental design I am analyzing.
For the differential expression analysis, which of the the following formulas should I consider :
- ~treatment+age (treatment is considered a batch effect, according to DESeq2 documentation. I don't feel like this is correct here.)
- ~age (forget the treatment)
- ~treatment:age (consider interactions between age and treatment)
- ~treatment*age (consider both effect and interaction of age and treatment, it is similar to ~treatment+age+treatment:age)
I am a bit confused, and I have always been confronted to the classic "condition-and-batch-effect" type of experimental design.
I have tried all of those formulas, and I have some genes/transcripts in common, other are diverging, adjusted p-values and q-values are sometimes very diverging. Most of my fellow biologist coworkers' hypothesis are answered by those formulas, but I guess that only one is correct.