Question: Inflated p-values from QQ-plot for Lasso GWAS regression coefficients
gravatar for madbadradscientist
16 months ago by
madbadradscientist0 wrote:

I ran Lasso for a trait given SNPs to get sparse regression coefficients. Then I ran a permutation test (ie running Lasso on shuffled datasets) to get the null distribution, and thus p-values for each regression coefficient. I now have created the QQ-plot for the p-values. Do these results show that there's genomic inflation that needs to be corrected?

On the one hand, the slope of the curve doesn't look good. On the other hand, (and this isn't apparent from the plot), the vast majority of coefficients (> 90%) were non-zero, and thus have p-values of 1. So the SNPs in the curve are actually atypical coefficients. This also means that if I try to do genomic control, the median lambda_gc is actually 0, which would indicate deflated p-values! Is there another way to assess p-values for confounding when doing sparse regression for GWAS?

gwas • 681 views
ADD COMMENTlink modified 16 months ago • written 16 months ago by madbadradscientist0

Can you give an idea of sample size and the balance between cases and controls in your study? Also, are all samples matched by ethnicity?

ADD REPLYlink written 16 months ago by Kevin Blighe61k

The sample size is 175. The samples are all asthma patients, and the output variable is an airway function testing quantitative trait. So it's linear regression, not logistic regression. And all the samples are non-Hispanic Caucasians.

ADD REPLYlink written 16 months ago by madbadradscientist0

Like FEV / FVC? - I published a few papers on asthma. Are you sure that the model assumptions are correct and that the lasso approach is the best one? Study is balanced between cases and controls? What if you first test each variable independently and then collate those p-values? When you eventually come up with a panel of variables / markers, you can put them in a merged model and proceed from there (?). Just thinking out loud.

ADD REPLYlink written 16 months ago by Kevin Blighe61k

Yes, so actually I was using doing multiple sparse regressions for different airway measurements. And then I grouped p-values from separate regressions together in the same plot. I think this was contributing to the problem, especially because I was using a group-lasso type penalty to share information across airway measurements. I'll instead need to analyze each regression model separately. Thanks for the helpful conversation!

ADD REPLYlink written 16 months ago by madbadradscientist0

Yes, you will want to keep the p-values separate from each test. Have a nice time analysing!

ADD REPLYlink written 16 months ago by Kevin Blighe61k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 991 users visited in the last hour