Normalization of Ordinal-Categorical data in GWAS
1
0
Entering edit mode
5.8 years ago
kakukeshi ▴ 80

Hi,

I want to perform a GWAS using the number of events (e.g number of epileptic attacks) as a trait. Do I need to perform normalization of this type of data and if so, which method would you recommend me. The data has the following distribution:

enter image description here

GWAS • 2.1k views
ADD COMMENT
0
Entering edit mode
5.8 years ago

Hello, not a common question but I've given 2 answers on it previously:

In a nutshell (to summarise), you don't have to normalise provided that your statistical test is set-up to expect the distribution that your variable follows. Your epileptic attacks variable looks like a kind of Poisson or negative binomial distribution, in which case your test would have to be aware of this. Otherwise, you could log-normalise your variable such that it would follow the binomial / normal distribution, which is the default distribution expected for most tests.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you very much for your answer Kevin. I noticed your previous post about the normalization of quantitative variables, but I think with categorical data the solution is more complex. I think the data follows something like an exponential or geometric distribution. I've tried to log-normalize the data to get something normally distributed unsuccessfully (being the closest something like log(x)+0.001).

I'm quite curious about this. I thought that this was a common problem in GWAS because of the many traits like this but apparently is not the case. By the way, do you know any GWAS study on traits with a similar distribution?

Many thanks

ADD REPLY
1
Entering edit mode

Yes, you're correct that it is more complex for categorical data - my previous answers were more for continuous variables. Also, this is most likely not a common theme on Biostars because it's definitively more in the realm of statistics; so, you would likely get a better response on https://stats.stackexchange.com/

I don't recall reading any published manuscript specifically on the topic of the normaliation of categorical variables. I think that information on what people have done would be buried deep in the supplementary material. I have come across a few publications on this general topic, such as: Are your covariates under control? How normalization can re-introduce covariate effects

I'm trying to think in which ways it would affect the statistical models and to find ways around it. I don't know, exactly, if breaking up your data into strata and conducting conditional logistic regression would help. I don't know what other variables you have in your study, though.

ADD REPLY

Login before adding your answer.

Traffic: 1933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6