Hello,
I have conducted a large-scale GWAS study and got a few significantly associated SNPs. I used GEMMA with -lmm 1 options to run the GWAS and obtain the beta and standard-error estimates. I want to estimate the percent phenotypic variation explained by each of the significant SNPs. I used the following procedure for estimating the variance explained in R:
fit <- lm (Phenotypic_value ~ SNP_data, data = a)
summary(fit)$r.squared
Here, the datafile a contains three columns namely, sample_ID, Phenotypic_value for each sample, and the biallelic SNP_data. I got a value which is 0.43 meaning 43% phenotypic variation explained by the SNP.
Again, I used another formula which is: 2*f*(1-f)*b.alt^2. Here, f is the minor allele frequency and b.alt is the effect size i.e. beta estimate obtained from GEMMA. This gives me a value of 0.03 meaning 3% variation explained which seems reasonable to me.
My question is that which of the following method is correct? or Is there any other way to estimate the percent variation explained?
Alternatively, from the GEMMA google group, I have got this formula pve <- var(x) * (beta^2 + se^2)/var(y). But I do not understand how can I obtain the value of var(x) and var(y).
It will be great to receive some feedback on this. Thank you.
In your case:
In linear regression involving no covariates (y=alpha+beta*x+e), the correlation coefficient between x and y can be expressed as
and then you want to take the square of this. I am not sure where the se^2 term comes from, but I see the author of GEMMA won't back up his claim. Generate some fake data in R and you'll see the formula is wrong and the se^2 does not belong there (for simple regression). There's no reason why an estimate having a higher se would explain a higher % of the variance. Maybe it has to do with the fact that GEMMA is a LMM, I don't know I am not familiar enough.
Since
your other formula is equivalent only if your y has unit variance.
Hi @Lemire Ok, so the correct formula is then
pve <-sqrt(var(x))*beta/sqrt(var(y))and thenpve^2where var(x) is2*f*(1-f)?Do you have any reference sources for that?
Thank you very much.
What about the second formula? Is it correct this way?