Question: Do we have to use a covariance (at quantitative scale) that is a normal distribution in linear regression when running a GWAS
0
gravatar for wenqianglongli
9 months ago by
wenqianglongli10 wrote:

Hi all,

Hope this finds you well.

I am adding a covariance which is a quantitative variable (~600 samples) in my linear regression when I run a GWAS. The data turn out not normally distributed and I have tried many different ways to normalize it and outliers have been removed as well. Although the histograms and QQ plots of a few that were treated with cube root, Tukey ladder, box cox transformation looked fine. The normality tests still suggested that there are not normally distributed.

So my questions, is it ok to use one of them (after the normalizing) as a cavariance even it is not normally distributed (but close to)?

Thank you for your time!

Aydem

linear regression gwas • 281 views
ADD COMMENTlink modified 7 months ago by Biostar ♦♦ 20 • written 9 months ago by wenqianglongli10

Did you try natural log and squaring? For example:

log(cov)
cov^2

Can you plot the histogram of the covariate?

Another option is to categorise it.

ADD REPLYlink written 9 months ago by Kevin Blighe52k

Hi Kevin,

Many thanks for your reply. Yes, I did try them. However, Iike I said, even some of the histograms look fine, the normality tests still suggest that there are not normally distributed.

Please find the histograms of the data following the link below.

https://www.dropbox.com/s/usz23ftwkkwbeay/my_plots_remove_outliers2.pdf?dl=0

ADD REPLYlink written 9 months ago by wenqianglongli10

Which normality tests are you using? What are the results? Only synthetic data will follow a perfect normal distribution. Looking at the histograms, they all visually look fine... like I said, no distribution of real data will follow a perfect normal.

ADD REPLYlink modified 9 months ago • written 9 months ago by Kevin Blighe52k

Hi Kevin,

Thanks for your reply! Yes, I do get your points that wont be a perfect normal distribution in real data especially in some cases of diseases. Not only the histograms look fine to me, but also the qq plots look good as well.

I tried shapiro test first, and the results are: 1) data: d$Question (without data transformation) W = 0.92559, p-value < 2.2e-16 2) data: d$T_sqrt W = 0.98555, p-value = 1.551e-05 3) data: d$T_cub W = 0.99179, p-value = 0.002566 4) data: d$T_log W = 0.97304, p-value = 6.868e-09 5)data: d$T_tuk W = 0.99192, p-value = 0.002894 6) data: d$T_box W = 0.99192, p-value = 0.002894

As you can see, none of them are normally distributed.

I thought that this test might only apply to small sample size, so I tried One-sample Kolmogorov-Smirnov test, and the results are: 1) data: d$Question (without data transformation) D = 0.079553, p-value = 0.001232 2) data: d$T_sqrt D = 0.088076, p-value = 0.0002323 3) data: d$T_cub D = 0.075268, p-value = 0.002675 4) data: d$T_log D = 0.064056, p-value = 0.01658 5) data: d$T_tuk D = 0.073257, p-value = 0.003791 6) data: d$T_box D = 0.074277, p-value = 0.00318

However, they are still not normally distributed.

So back to my origin question, even the normality tests do not suggest they are normally distributed. Some of the histograms and qq plots do look fine. Is it ok to use one of the data sets?

ADD REPLYlink written 9 months ago by wenqianglongli10

You may want to read the first answer in this very interesting thread: Is normality testing 'essentially useless'?

In light of everything, and based on the fact that the qq plots look fine, I'd say that you can go ahead.

ADD REPLYlink written 9 months ago by Kevin Blighe52k
1

Hi Kevin,

That answer is very useful!! Thanks for sharing and for your help.

ADD REPLYlink written 9 months ago by wenqianglongli10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 780 users visited in the last hour