22 months ago by
The percentage explained can be calculated in different ways, it's always a model exercise, but the exact details of the model vary (different covariates - different maths - etc.)
Roughly, you predict the phenotype using something like a * SNP 1 + b * SNP 2 + c * SNP 3 + d * SNP 4 = phenotype, and calculate how much of the prediction agrees with the actual phenotype. What a,b,c,d are and how they are calculated depends on the method used (they can also all be 1).
If it fits perfectly to the phenotype then it's 100% of observed variance explained, but that never happens, it's always a percentage much lower than that (20% in your case).
Here's one example on how to calculate it with adult height: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4250049/
We used GCTA-COJO analysis7,8 to select the top associated SNPs. This
method uses the summary statistics from the meta-analysis and LD
correlations between SNPs estimated from a reference sample to perform
a conditional association analysis7. The method starts with an initial
model of the SNP that shows the strongest evidence of association
across the whole genome. It then implements the association analysis
conditioning on the selected SNP(s) to search for the top SNPs
one-by-one iteratively via a stepwise model selection procedure until
no SNP has a conditional P-value that passes the significance level.
Finally, all the selected SNPs are fitted jointly in the model for
effect size estimation.
Papers 7 and 8:
Yang J, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet. 2012;44:369–75. S1–3.
Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82.
So it starts with a * SNP 1= phenotype based on the strongest performing SNP, and then keeps on adding SNPs to the model until a certain cutoff is hit.