Question: Is it problematic to use ANOVA in microarray meta-analysis?
gravatar for Kristin Muench
4.9 years ago by
United States
Kristin Muench560 wrote:


My lab is interested in doing a meta-analysis of two microarray datasets downloaded from a public repository. In each dataset, a microarray analysis was done on tissue collected at two timepoints. We're interested in asking, across all timepoints, whether genes change expression abundance between timepoint 1 and timepoint 2.

Originally, I had approached this problem with RankProd, and found a few genes of interest that were significantly up/downregulated. A colleague took a different approach: first, background corrected/normalized data was batch effect-adjusted using ComBat. From this ComBat-adjusted expression data matrix, ten genes of interest were isolated. She then did a two-way ANOVA with factors for Gene (i.e., ten possible) and Timepoint (i.e., two possible). With this method, the found significant Gene x Timepoint interaction. With Tukey's post-hoc, she found several genes were significantly up- or down-regulated. Some of these, but not all, were the same genes that had been identified as up or down-regulated with RankProd.

My potential concerns are 1) different numbers of samples in each dataset (Dataset 1= 3 samples at each timepoint, Dataset 2=15 samples at each timepoint), and 2) potential statistical concerns we're not considering by doing an ANOVA on ComBat output.

Is it okay to apply ANOVA to microarray data in this way? If not, why not?

Thank you for your help!

microarray stats anova • 1.9k views
ADD COMMENTlink modified 4.9 years ago by Devon Ryan98k • written 4.9 years ago by Kristin Muench560
gravatar for Devon Ryan
4.9 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

Normally a linear model is used on microarray data (an ANOVA is a special case of this). RankProd is nice when you have non-normally distributed datasets and a large number of samples. I would strongly encourage you to use limma, which will have greater power than an ANOVA is probably give more reliable results to boot. Do not include genes as a factor, there's no point in doing so, you should be fitting the data one gene at a time with the model.

BTW, I hope you used ComBat before using RankProd, since RankProd can only handle two groups.

ADD COMMENTlink written 4.9 years ago by Devon Ryan98k

Thank you for the helpful suggestion! To clarify: when you say RankProd can only handle two groups, do you mean two timepoints, or two datasets? I had used three datasets and two timepoints with RankProd at one point, but this seemed O.K. according to the vignette (3 "origins", 2 "conditions"). I do not think I used ComBat before RankProd in that case.

ADD REPLYlink written 4.9 years ago by Kristin Muench560

Two timepoints, it can thankfully handle multiple samples per time point :)

Give ComBat a try and then try to run RankProd on the results. I suspect you get more similar results to what your colleague got with the ANOVA. The point here is that you're pretty much guaranteed to have a batch effect when doing a metanalysis. If you don't correct for it then you end up tanking your statistical power.

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by Devon Ryan98k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 938 users visited in the last hour