Hi everyone,
I’m studying the transcriptional response of a specific cell line to a treatment, with a focus on short-term exposure. I’ve gathered multiple publicly available RNA-seq datasets (via SRA) from experiments that used similar—but not identical—growth and treatment protocols. These datasets span a range of time points: 6h, 18h, 22h, 1d, 2d, 3d, 4d, 5d, and 2 weeks, with an average of 3 replicates per condition.
I’ve processed all datasets through the same pipeline (nf-core/rnaseq), and performed differential expression analysis on each dataset separately using the differentialabundance pipeline. This yielded lists of DE genes and enriched/depleted pathways using both GSEA and gProfiler.
Now I would like to derive robust conclusions about the dynamics of the treatment response over time. I have a few specific questions:
Is my current approach valid? Or would it be better to pool samples across studies for each time point (e.g., combine all “day 2” samples from different datasets), despite differences in experimental protocols?
If keeping datasets separate is acceptable, how should I identify consistently relevant pathways or genes when enrichment results differ across time points? Should I rely on metrics like average fold change or recurrence across datasets?
Are there tools or methods specifically designed for integrating DESeq2 and GSEA results over time or across studies (meta-analysis style)?
This is my first time working on a study similar to a meta-analysis, so any guidance on strategy, methodology, or tools would be greatly appreciated.
Thank you in advance for your help!