First, I apologize if my question seems duplicated, I've extensively searched and read the previously asked questions, but different and sometimes contradicting opinions made it hard for me to reach a final conclusion.
My experiment objective is to generate a list of Differentially expressed genes between tumoral cells and their healthy counterparts for subsequent analysis. Based on what I have learned so far, I have this analysis pipeline in mind:
1- Collect raw (.CEL) data of different experiments "from the same platform" (HG-U133_Plus_2)
2- Quality control, preprocess and normalize samples within each experiment separately.
3- combine all of the "normalized" samples into a single dataset, but keep the batch effect in mind (and use combat or just use their original experiment set name as a covariant while analyzing with limma.)
4- perform Differential gene expression on the combined dataset.
Is this approach valid? Or should I first combine all of the samples from every experiment into one dataset, and then normalize them together in a single step?
Thank you for your time. regards.