Question

In GWAS Studies, how to understand "97 SNPs explain 2.7% of BMI"?

0

Entering edit mode

7.4 years ago

Tao ▴ 530

Hi guys,

I'm a newbee on GWAS study and I saw sentences in a talk by John Quackenbush that

"97 SNPs explain 2.7% of BMI"

"All common SNPs may explain 20% of BMI"

What's the meaning of that percentage? How are the percentages calculated?

Thanks!

Tao

GWAS • 4.7k views

ADD COMMENT • link updated 7.4 years ago by Philipp Bayer 8.7k • written 7.4 years ago by Tao ▴ 530

0

Entering edit mode

That probably means something like...

"You can determine someone's racial composition or location by looking at their SNPs. Those are both factors in BMI, which makes the SNPs correlated. These SNPs have no known causal relationship with BMI, but it's easy to use them to publish papers."

ADD REPLY • link 7.4 years ago by Brian Bushnell 20k

0

Entering edit mode

The rationale here is "heritability" which measures the proportion of the total phenotypic variation that's due to genetic variance. The percentage here is to describe the percentage of BMI variance due to genetic variance in the study cohort. (Total phenotypic variance = genetic variance + environmental variance). But I still don't know how is this calculated.

ADD REPLY • link 7.4 years ago by Tao ▴ 530

0

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY • link 7.4 years ago by GenoMax 146k

score 1 · Answer 1 · 2017-05-25

The percentage explained can be calculated in different ways, it's always a model exercise, but the exact details of the model vary (different covariates - different maths - etc.)

Roughly, you predict the phenotype using something like a * SNP 1 + b * SNP 2 + c * SNP 3 + d * SNP 4 = phenotype, and calculate how much of the prediction agrees with the actual phenotype. What a,b,c,d are and how they are calculated depends on the method used (they can also all be 1).

If it fits perfectly to the phenotype then it's 100% of observed variance explained, but that never happens, it's always a percentage much lower than that (20% in your case).

Here's one example on how to calculate it with adult height: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4250049/

We used GCTA-COJO analysis7,8 to select the top associated SNPs. This method uses the summary statistics from the meta-analysis and LD correlations between SNPs estimated from a reference sample to perform a conditional association analysis7. The method starts with an initial model of the SNP that shows the strongest evidence of association across the whole genome. It then implements the association analysis conditioning on the selected SNP(s) to search for the top SNPs one-by-one iteratively via a stepwise model selection procedure until no SNP has a conditional P-value that passes the significance level. Finally, all the selected SNPs are fitted jointly in the model for effect size estimation.

Papers 7 and 8:

Yang J, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet. 2012;44:369–75. S1–3.
Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82.

So it starts with a * SNP 1= phenotype based on the strongest performing SNP, and then keeps on adding SNPs to the model until a certain cutoff is hit.