Question

Is it problematic to use ANOVA in microarray meta-analysis?

0

Entering edit mode

8.1 years ago

Kristin Muench ▴ 620

Hello,

My lab is interested in doing a meta-analysis of two microarray datasets downloaded from a public repository. In each dataset, a microarray analysis was done on tissue collected at two timepoints. We're interested in asking, across all timepoints, whether genes change expression abundance between timepoint 1 and timepoint 2.

Originally, I had approached this problem with RankProd, and found a few genes of interest that were significantly up/downregulated. A colleague took a different approach: first, background corrected/normalized data was batch effect-adjusted using ComBat. From this ComBat-adjusted expression data matrix, ten genes of interest were isolated. She then did a two-way ANOVA with factors for Gene (i.e., ten possible) and Timepoint (i.e., two possible). With this method, the found significant Gene x Timepoint interaction. With Tukey's post-hoc, she found several genes were significantly up- or down-regulated. Some of these, but not all, were the same genes that had been identified as up or down-regulated with RankProd.

My potential concerns are 1) different numbers of samples in each dataset (Dataset 1= 3 samples at each timepoint, Dataset 2=15 samples at each timepoint), and 2) potential statistical concerns we're not considering by doing an ANOVA on ComBat output.

Is it okay to apply ANOVA to microarray data in this way? If not, why not?

Thank you for your help!

microarray stats ANOVA • 2.5k views

ADD COMMENT • link updated 8.1 years ago by Devon Ryan 104k • written 8.1 years ago by Kristin Muench ▴ 620

score 1 · Accepted Answer · 2016-04-04

1

Entering edit mode

8.1 years ago

Devon Ryan 104k

Normally a linear model is used on microarray data (an ANOVA is a special case of this). RankProd is nice when you have non-normally distributed datasets and a large number of samples. I would strongly encourage you to use limma, which will have greater power than an ANOVA is probably give more reliable results to boot. Do not include genes as a factor, there's no point in doing so, you should be fitting the data one gene at a time with the model.

BTW, I hope you used ComBat before using RankProd, since RankProd can only handle two groups.

ADD COMMENT • link 8.1 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you for the helpful suggestion! To clarify: when you say RankProd can only handle two groups, do you mean two timepoints, or two datasets? I had used three datasets and two timepoints with RankProd at one point, but this seemed O.K. according to the vignette (3 "origins", 2 "conditions"). I do not think I used ComBat before RankProd in that case.

ADD REPLY • link 8.1 years ago by Kristin Muench ▴ 620

0

Entering edit mode

Two timepoints, it can thankfully handle multiple samples per time point :)

Give ComBat a try and then try to run RankProd on the results. I suspect you get more similar results to what your colleague got with the ANOVA. The point here is that you're pretty much guaranteed to have a batch effect when doing a metanalysis. If you don't correct for it then you end up tanking your statistical power.

ADD REPLY • link 8.1 years ago by Devon Ryan 104k