Do P-Values From A Generalised Linear Model Need Correction For Multiple Testing?
1
5
Entering edit mode
11.0 years ago
Phis ★ 1.1k

After performing a GLM in R, I get back a set of p-values. Does anybody know whether these p-values still need to be adjusted using p.adjust or similar to account for repeated hypothesis testing or is this already accounted for by the GLM? If the values need to be corrected, which adjustment method(s) are suitable/recommended?

r statistics multiple • 7.4k views
11
Entering edit mode
11.0 years ago

Hi PhiS,

When doing multiple regressions in R, you are typically doing one regression at a time, and repeating the process multiple times, while storing the p-values. If these tests (the regressions) are indeed 'multiple tests', then you do have the responsibility to do some form of correction, as the regression function you used is 'unaware' of the fact that you were doing many of these tests, side by side.

In this case, many options are offered to you, the most stringent being the infamous Bonferoni correction and the least stringent being the False Discovery Rate (FDR), which is very appropriate for large numbers of parallel testing.

The package Qvalue is available to install in R and would provide you with a method for doing such a FDR correction. Beware that this approach may be inappropriate for small (maybe lower than 50 or 30) parallel tests since it uses the distribution of p-values in the correction.

Hope this helps you!

0
Entering edit mode

Thanks, that's good information! But if I only have a single glm() call for a model like Y ~ X1 + X2 + X3, then would the resulting p-values for X1, X2, X3 also have to be adjusted?

0
Entering edit mode

No. What you would get then is a regression table containing the different p-values for your X1 to X3 factors. A standard approach is to construct a big model, containing all the factors of interest and their interactions. Then, from the table, you spot the biggest p-value. If it is above your criterion (eg: 0.05), you remove that factor or interaction. You repeat the process until all factors are significant. What you have then is the minimal model that explains the variation found in your data. Cheers!

0
Entering edit mode

What about having several glm() calls for models like Y ~ X1 + X2+ X3, then I have to adjust the p-values for the joint model of X1+X2+X3 for the number of multiple models tested?