Question

Question about Risk Scores

2

Entering edit mode

3.4 years ago

lisaks ▴ 30

Hello everybody,

I am very new to bioinformatics and come from a quantitative economics background. I was asked to help with a project on creating a risk score consisting of multiple SNPs and since then I have been doing some research on this.

One way to go as I've researched is to do this with a simple math formula, which would be an unweighted approach (like a TGS).

I am interested in the approach of giving weights to the SNPs as this should make the results more accurate(?). My main question is to ask if I understand the process correctly and if my ideas are possible:

I have predetermined SNPs of interest that are associated with a specific trait
calculate/get the betas for the chosen SNPs from GWAS
use these betas to weight and calculate a risk score for my own sample

Is this plausible?

I've read about PLINK and R-packages such as lassosum, PRsice, LDpred and PRS-CS. I don't fully understand the process of what the best way is to calculate/get the betas from GWAS.

I would be really thankful for any tips and help regarding this. Thanks in advance for taking the time to read and respond to my message :)

SNP PRS risk scores R GWAS • 1.1k views

ADD COMMENT • link updated 3.4 years ago by Kevin Blighe 87k • written 3.4 years ago by lisaks ▴ 30

score 2 · Answer 1 · 2020-11-23

There is no standard in this area, but you have the general idea correct. I have already seen people do:

summing / totalling the beta coefficients
multiplying the beta coefficients by some other weight
summing and scaling the beta coefficients to be between 0-1, 0-10, or 0-100, etc.

In my own approach in the private sector, years ago, I managed to use a Bayesian logistic regression and 'pre-adjusted' the beta coefficients by supplying conservation scores (log-scale) as priors - conservation score is the single best predictor of pathogenicity / functionality of a genetic variant. If a region is highly conservative, the effect would be to increase the beta coefficient.

Using PRS may not necessarily be any more accurate than just a standard model that includes, e.g., the minor alleles of the SNPs. You could add in [to this model] the computed PRS, which improve accuracy. I am just not sure that any PRS can account for the complexity of how the genome works.

Kevin