Question

Inflated p-values from QQ-plot for Lasso GWAS regression coefficients

1

Entering edit mode

5.2 years ago

madbadradscientist ▴ 20

I ran Lasso for a trait given SNPs to get sparse regression coefficients. Then I ran a permutation test (ie running Lasso on shuffled datasets) to get the null distribution, and thus p-values for each regression coefficient. I now have created the QQ-plot for the p-values. Do these results show that there's genomic inflation that needs to be corrected?

On the one hand, the slope of the curve doesn't look good. On the other hand, (and this isn't apparent from the plot), the vast majority of coefficients (> 90%) were non-zero, and thus have p-values of 1. So the SNPs in the curve are actually atypical coefficients. This also means that if I try to do genomic control, the median lambda_gc is actually 0, which would indicate deflated p-values! Is there another way to assess p-values for confounding when doing sparse regression for GWAS?

https://imgur.com/y845ncm

gwas • 2.3k views

ADD COMMENT • link 5.2 years ago by madbadradscientist ▴ 20

0

Entering edit mode

Can you give an idea of sample size and the balance between cases and controls in your study? Also, are all samples matched by ethnicity?

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

0

Entering edit mode

The sample size is 175. The samples are all asthma patients, and the output variable is an airway function testing quantitative trait. So it's linear regression, not logistic regression. And all the samples are non-Hispanic Caucasians.

ADD REPLY • link 5.2 years ago by madbadradscientist ▴ 20

0

Entering edit mode

Like FEV / FVC? - I published a few papers on asthma. Are you sure that the model assumptions are correct and that the lasso approach is the best one? Study is balanced between cases and controls? What if you first test each variable independently and then collate those p-values? When you eventually come up with a panel of variables / markers, you can put them in a merged model and proceed from there (?). Just thinking out loud.

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

0

Entering edit mode

Yes, so actually I was using doing multiple sparse regressions for different airway measurements. And then I grouped p-values from separate regressions together in the same plot. I think this was contributing to the problem, especially because I was using a group-lasso type penalty to share information across airway measurements. I'll instead need to analyze each regression model separately. Thanks for the helpful conversation!

ADD REPLY • link 5.2 years ago by madbadradscientist ▴ 20

0

Entering edit mode

Yes, you will want to keep the p-values separate from each test. Have a nice time analysing!

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k