Anova + Tukey: Multiple Testing Correction
1
2
Entering edit mode
13.1 years ago
Rossella ▴ 370

Hi, I want discover which genes are expressed in only one of five treatments. This is my pipeline:

1) anova between the five treatments 2) holm multiple testing correction 3) Tukey for significant genes discovered in step 2

My question is: should I also correct Tukey pvalues for example multiplying the pvalue by the number of significant anova pvalues (Bonferroni correction on the number of tests performed) or should I only correct at the level of anova?

Thanks a lot in advance

Rossella

multiple • 10k views
ADD COMMENT
0
Entering edit mode

Is this a microarray or rna-seq experiment? Have you loaded this data into an existing package for expression analysis? Seems like you are trying to do a lot by hand.

ADD REPLY
0
Entering edit mode

It is a microarray experiment. I am computing everything using the R function aov and TukeyHSD. I usually prefer to do things by hand because I can be sure of what it is actually being done, at lot of softwares do not give exact details of what kind of correction they are performing.

ADD REPLY
2
Entering edit mode
13.1 years ago
Michael 54k

I would say ask a statistician, but you already did (there is nothing wrong with-cross posting but you should mention it) but received no proper answer, so I will try to give one that should be handled with caution because I am not a statistician and is just backed by a little thinking:

No, imho you do not need to further correct the Tukey p-values for multiple testing!

Why:
I assume that you use Tukey's test to find those pairs of means that are significantly different for each gene significant in ANOVA after correction for multiple testing. Correction for multiple testing is done to protect you against inflation of Type I errors by repeatedly performing a test. Whether or not you have to apply correction depends on the kind of error you are trying to minimise.

For example:

  • Bonferroni correction adjusts the p-values in order to make less than single false rejection of a null hypothesis among all rejections in all tests
  • with FDR correction and a cutoff of 0.05, you limit your set of discoveries to the amount that contains at most 5% false rejections among all rejections in all tests (under the null hypothesis).

The correction done for the ANOVA p-values already limits the number of tests to be performed in the next step, if you use the corrected p-value and thereby protects against Type I errors already. If you use a certain cutoff, e.g. 0.05 after correction this will protect against inflation of type I errors on the gene level, and that is the only place where I can see multiple testing in this setting. After finding N significant genes by ANOVA, in the next step you would perform N Tukey tests. (Tukey's test already corrects for multiple comparisons across the multiple (5 in your case) groups.) Thus, no further significant gene would be added and the number of comparisons will be the same and no additional findings would be generated on the gene level.

I would wish I could explain that a bit better, so feel free to do so or discuss.

ADD COMMENT
0
Entering edit mode

Thanks for the reply. That is more or less what I was thinking too but I wanted to be sure. And thanks for mentioning that I should put a reference when I post the same question in other websites, I am new to this website and I still don't know all the rules.

ADD REPLY

Login before adding your answer.

Traffic: 2660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6