Question

Polygenic Risk Score Methods

0

Entering edit mode

2.3 years ago

Hanso • 0

Dear all,

"I have a general question regarding the statistical methods behind PRS.

Am I understanding correctly that many of the older PRS models (especially for breast cancer, for example https://jmg.bmj.com/content/56/9/581.long) simply sum up the GWAS-identified SNPs with their respective GWAS effect sizes without considering factors such as linkage disequilibrium (as newer methods like the Bayessche Models do)? Or is LD modeling no longer necessary due to the advancements in GWAS?

Some of the publications state that if multiple SNPs are in high LD, they select the SNP with the best p-value. Does this already qualify as pruning and thresholding?"

Thank you :)

PRS GWAS • 946 views

ADD COMMENT • link updated 2.3 years ago by LChart 5.2k • written 2.3 years ago by Hanso • 0

score 3 · Accepted Answer · 2023-07-08

Some of the publications state that if multiple SNPs are in high LD, they select the SNP with the best p-value. Does this already qualify as pruning and thresholding?

Yes; this is just taking the "lead SNP" for a locus, and assuming that there is only a single well-tagged effect. However there are several loci and phenotypes that have multiple independent effects (PMC3487134), so while this is one method of pruning and thresholding, it could be considered overly-aggressive.

without considering factors such as linkage disequilibrium

It's not unfair to assume that, unless pruning or LD modeling is explicitly mentioned or presented in the text, that the trivial (geno * beta) was used. However, most studies would have used Plink, PRSice, or equivalent, to calculate these scores, which either recommend or enforce-by-default LD pruning prior to LD calculation.