Question: using different microarray datasets (meta-analysis?) for DEG pipeline
0
gravatar for Moosa
5 months ago by
Moosa30
Moosa30 wrote:

First, I apologize if my question seems duplicated, I've extensively searched and read the previously asked questions, but different and sometimes contradicting opinions made it hard for me to reach a final conclusion.

My experiment objective is to generate a list of Differentially expressed genes between tumoral cells and their healthy counterparts for subsequent analysis. Based on what I have learned so far, I have this analysis pipeline in mind:

1- Collect raw (.CEL) data of different experiments "from the same platform" (HG-U133_Plus_2)

2- Quality control, preprocess and normalize samples within each experiment separately.

3- combine all of the "normalized" samples into a single dataset, but keep the batch effect in mind (and use combat or just use their original experiment set name as a covariant while analyzing with limma.)

4- perform Differential gene expression on the combined dataset.

Is this approach valid? Or should I first combine all of the samples from every experiment into one dataset, and then normalize them together in a single step?

Thank you for your time. regards.

ADD COMMENTlink modified 5 months ago by Kevin Blighe52k • written 5 months ago by Moosa30
3
gravatar for Kevin Blighe
5 months ago by
Kevin Blighe52k
Kevin Blighe52k wrote:

If they are all from the same platform, then my preference is to process them together, but still keep batch as a covariate in your statistical models. As you go through the analysis, you will learn which procedure is best. Keep in mind that there is no standard for what you are aiming to do.

Kevin

ADD COMMENTlink written 5 months ago by Kevin Blighe52k
1

Thank you for your help. : )

ADD REPLYlink modified 5 months ago • written 5 months ago by Moosa30
1

Cool. Generally, you seem to be aware of where the pitfalls may lie with such a procedure, so, that is a good start. I'm only adding this further comment for others who may arrive at this thread:

After you normalise everything together, a check of the box-and-whisker plot will be immediately informative: if the experiments are grossly different, then this should be visible on such a plot as the different experiments will likely not line up at their median, even after quantile normalisation. For example, if one is brain tissue while the other is skin, then these will have gossly different expression profiles and it would obviously be more appropriate to analyse them separately, even if they are the same chip type.

ADD REPLYlink written 5 months ago by Kevin Blighe52k

Thank you for your comment. I've actually learned a lot by reading your posts. One question that crosses my mind is that because most normalization methods (for example RMA) shares information between arrays, would it not be safer if I normalize the data form each experiment separately?

To explain myself better, there are always non-biological variations between samples which we try to "minimize" using normalization. Because arrays from different experiments (labs) have more differences related to non-biological variations between themselves, can't we expect that normalizing all of them together would be an oversimplification and could lead to a loss of biological variations in the process? Can we consider "normalizing each experiment samples separately" as a more conservative approach?

ADD REPLYlink modified 5 months ago • written 5 months ago by Moosa30
1

can't we expect that normalizing all of them together would be an oversimplification and could lead to a loss of biological variations in the process?

Yes, this is why I also said this: "As you go through the analysis, you will learn which procedure is best."

Questions like yours have no real answer... they are each going to require a different approach based on many factors, all of which cannot be properly defined in a single 'catch all' answer.

ADD REPLYlink modified 5 months ago • written 5 months ago by Kevin Blighe52k
1

I see, thank you again for your time. : )

ADD REPLYlink written 5 months ago by Moosa30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1802 users visited in the last hour