Question: Polygenic Risk Score analysis
gravatar for pedro.raposo3
5 weeks ago by
pedro.raposo310 wrote:


I'm very new in the GWAS and PRS analsysis, so my question is simple but I cannot find a straightforward answer anywhere: Is it possible, with a comprehensive database of risk scores associated with traits, to calculate polygenic risk scores for a specific genome? By this, I mean if I can "diagnose" a genome for all diseases already studied by previous genome wide studies.

I'd assume it's not that simple - if that was the case, every paper would reference it.

Thank you for your time

bioinformatics prs gwas • 157 views
ADD COMMENTlink modified 5 weeks ago by Kevin Blighe51k • written 5 weeks ago by pedro.raposo310
gravatar for Kevin Blighe
5 weeks ago by
Kevin Blighe51k
Kevin Blighe51k wrote:

Technically, it is possible, by one doing the following:

  1. constructing predictive models using all statistically significant GWAS hits for each condition / phenotype
  2. cross-validating and refining the models on training and testing data
  3. making model predictions on new data

Some extra points to consider:

  • statistically significant GWAS hits may not necessarily result in disease or confer a particular phenotype; instead they may only increase / decrease risk (that is, to say, that many of these variants have incomplete penetrance)
  • getting samples to do this work will be difficult
  • 'polygenic risk score' is a generic term and there are many ways to construct these. Most are built from the beta coefficient from the regression model fit
  • you should consider how you are going to build and fit the model. Perhaps something along the lines of elastic-net or ridge regression would be a start. Others have use lasso-penalised regression, in the past, to do something similar for breast cancer somatic variants.

Note, that, replacing 'predictive models' with 'AI' or 'machine learning algorithm' will likely increase your chance of funding for the work, if that is ultimately what you want.


ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Kevin Blighe51k

So, it's not as straightforward as simply calculating the PRS based on our genome's mapped SNPs, I see.

Thank you for the comprehensive answer!

ADD REPLYlink written 5 weeks ago by pedro.raposo310

Ah, if you want a more automated way to do it, then I would recommend taking a look at PRsice by Sam

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Kevin Blighe51k

Thank you for both of your answers

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by pedro.raposo310

You can also look into our tutorial. However, I guess what you are asking is slightly different, in that you already got PRS associated with disease and you've got a new genome that you want to calculate the Score on. For that, you'll need to know what SNPs were used for the construction and what the weights (this are usually beta-coefficient from GWAS, either used as is (e.g. PRSice), or regularized / shrinked (e.g. LDpred, lassosum, PRS-CS etc). Once you've both information, you'll be able to re-calculate the score.

ADD REPLYlink written 4 weeks ago by Sam2.5k

Thank you Sam. So, it seems that I can achieve that with GWAS catalog since a collection of different GWAS are present, and most of the SNPs have a beta-coefficient associated with them.

ADD REPLYlink written 22 days ago by pedro.raposo310

Yes, you can, but beware that using only the significant SNPs tends to generate underpowered PRS and if the study of interest use SNPs that are outside of the genome wide significance threshold, then it is likely that you won't have the information required to regenerate the score

ADD REPLYlink written 19 days ago by Sam2.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1774 users visited in the last hour