Question

Anova + Tukey: Multiple Testing Correction

2

Entering edit mode

13.1 years ago

Rossella ▴ 370

Hi, I want discover which genes are expressed in only one of five treatments. This is my pipeline:

1) anova between the five treatments 2) holm multiple testing correction 3) Tukey for significant genes discovered in step 2

My question is: should I also correct Tukey pvalues for example multiplying the pvalue by the number of significant anova pvalues (Bonferroni correction on the number of tests performed) or should I only correct at the level of anova?

Thanks a lot in advance

Rossella

multiple • 10k views

ADD COMMENT • link updated 9.1 years ago by Biostar 20 • written 13.1 years ago by Rossella ▴ 370

0

Entering edit mode

Is this a microarray or rna-seq experiment? Have you loaded this data into an existing package for expression analysis? Seems like you are trying to do a lot by hand.

ADD REPLY • link 13.1 years ago by Jeremy Leipzig 22k

0

Entering edit mode

It is a microarray experiment. I am computing everything using the R function aov and TukeyHSD. I usually prefer to do things by hand because I can be sure of what it is actually being done, at lot of softwares do not give exact details of what kind of correction they are performing.

ADD REPLY • link 13.1 years ago by Rossella ▴ 370

Ram · Answer 1 · 2011-03-09

I would say ask a statistician, but you already did (there is nothing wrong with-cross posting but you should mention it) but received no proper answer, so I will try to give one that should be handled with caution because I am not a statistician and is just backed by a little thinking:

No, imho you do not need to further correct the Tukey p-values for multiple testing!

Why:
I assume that you use Tukey's test to find those pairs of means that are significantly different for each gene significant in ANOVA after correction for multiple testing. Correction for multiple testing is done to protect you against inflation of Type I errors by repeatedly performing a test. Whether or not you have to apply correction depends on the kind of error you are trying to minimise.

For example:

Bonferroni correction adjusts the p-values in order to make less than single false rejection of a null hypothesis among all rejections in all tests
with FDR correction and a cutoff of 0.05, you limit your set of discoveries to the amount that contains at most 5% false rejections among all rejections in all tests (under the null hypothesis).

The correction done for the ANOVA p-values already limits the number of tests to be performed in the next step, if you use the corrected p-value and thereby protects against Type I errors already. If you use a certain cutoff, e.g. 0.05 after correction this will protect against inflation of type I errors on the gene level, and that is the only place where I can see multiple testing in this setting. After finding N significant genes by ANOVA, in the next step you would perform N Tukey tests. (Tukey's test already corrects for multiple comparisons across the multiple (5 in your case) groups.) Thus, no further significant gene would be added and the number of comparisons will be the same and no additional findings would be generated on the gene level.

I would wish I could explain that a bit better, so feel free to do so or discuss.