In GWAS Studies, how to understand "97 SNPs explain 2.7% of BMI"?
1
0
Entering edit mode
4.7 years ago
Tao ▴ 460

Hi guys,

I'm a newbee on GWAS study and I saw sentences in a talk by John Quackenbush that

"97 SNPs explain 2.7% of BMI"

"All common SNPs may explain 20% of BMI"

What's the meaning of that percentage? How are the percentages calculated?

Thanks!

Tao

GWAS • 3.1k views
0
Entering edit mode

That probably means something like...

"You can determine someone's racial composition or location by looking at their SNPs. Those are both factors in BMI, which makes the SNPs correlated. These SNPs have no known causal relationship with BMI, but it's easy to use them to publish papers."

0
Entering edit mode

The rationale here is "heritability" which measures the proportion of the total phenotypic variation that's due to genetic variance. The percentage here is to describe the percentage of BMI variance due to genetic variance in the study cohort. (Total phenotypic variance = genetic variance + environmental variance). But I still don't know how is this calculated.

0
Entering edit mode

1
Entering edit mode
4.7 years ago

The percentage explained can be calculated in different ways, it's always a model exercise, but the exact details of the model vary (different covariates - different maths - etc.)

Roughly, you predict the phenotype using something like a * SNP 1 + b * SNP 2 + c * SNP 3 + d * SNP 4 = phenotype, and calculate how much of the prediction agrees with the actual phenotype. What a,b,c,d are and how they are calculated depends on the method used (they can also all be 1).

If it fits perfectly to the phenotype then it's 100% of observed variance explained, but that never happens, it's always a percentage much lower than that (20% in your case).

Here's one example on how to calculate it with adult height: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4250049/

We used GCTA-COJO analysis7,8 to select the top associated SNPs. This method uses the summary statistics from the meta-analysis and LD correlations between SNPs estimated from a reference sample to perform a conditional association analysis7. The method starts with an initial model of the SNP that shows the strongest evidence of association across the whole genome. It then implements the association analysis conditioning on the selected SNP(s) to search for the top SNPs one-by-one iteratively via a stepwise model selection procedure until no SNP has a conditional P-value that passes the significance level. Finally, all the selected SNPs are fitted jointly in the model for effect size estimation.

Papers 7 and 8:

1. Yang J, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet. 2012;44:369–75. S1–3.

2. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82.

So it starts with a * SNP 1= phenotype based on the strongest performing SNP, and then keeps on adding SNPs to the model until a certain cutoff is hit.

0
Entering edit mode

Hi Philipp, I think your explanation of variance explained is more like prediction accuracy (i.e., construct polygenic score (PGS) and calculate the squared correlation between PGS and phenotype from an independent target sample). In my understanding, variance explained is the proportion of variance, which is the variance of the estimated PGS divided by the variance of phenotype. It is more like heritability.

In Yengo 2018 height and BMI paper, he actually distinguished the two (i.e., variance explained and prediction accuracy) by

For height, the variance explained increased from ~24.6% using 3,290 GWS SNPs to ~34.7% (s.e.1.9%) using ~15,000 SNP with p<0.001. The prediction R2 also increased from ~19.7% to ~24.4%.

Let's not use any software to see how "97 SNPs explain 2.7% of BMI". First, we use the training sample to estimate effect size for all SNPs (e.g., OLSE). Then we construct PGS of that 97 SNPs by calculating the sum of estimated effect sizes multiplying allele dosage from an independent sample (i.e., target sample). Then for each sample in the target sample, there will be a corresponding PGS. Next, we calculate the variance of PGS and divided it by the phenotypic variance in the target sample.

Please correct me if I have any misunderstanding. Thank you.