Entering edit mode

2.0 years ago

rimgubaev
▴
270

Hello everyone! I wonder if somebody could suggest me an article related or the solution to the following problem. I got phenotype data collected for the plant of interest for three years to run GWAS. For some samples phenotype data failed to pass Shapiro-Wilk test for normality so I want to use the median phenotype values instead of the mean values, however, I failed to find articles and/or tutorials where the similar approach have been used. If you faced such problem or know such articles please suggest!

I wonder how many of phenotypes failed Shapiro-Wilk test (is it around 5%)? Will not it be "cheaper" to find a transformation of the values (e.g. Box-Cox transform)? In some cases even median will not help (imagine zero-inflated data) - you will need to apply linear models with special links to work with this data.

It's not so many actually, it is 7%.

But you know that 5% of tests will be rejected with alpha = 0.05 just because statistical tests work like this? So you have 2% "unexpectedly non-normal data" - and there may be biological effects there, in these 2%. So, I'd think twice if I need to invent another method to deal with the data.

However, even saying this - pre-testing for normality is not always a recommended practice. 1) it may consider as normal data which is not obviously normal (e.g. mutli-modal data), 2) it creates additional burden of tests - and we know what happens when statistical tests mutliply https://en.wikipedia.org/wiki/Multiple_comparisons_problem