How to estimate polygenic risk score (PRSs) using the scoring files from PGSCatalog for one individual?
Hi all,

I have an annotated vcf file for one individual which I want to estimate his polygenic risk score (PRS) for a certain trait, using the scoring files from the PGSCatalog. The scoring file contains the SNP ID, reference and alternative allele, and weights. How can I estimate the PRS using the scoring file without using the classical approach of having GWAS summary data, target data, etc...?

PGSCatalog has all you need to calculate the score, if individual has the effect allele then multiply it with beta, do the same for all SNPs, then sum.

Thank you very much for your answer. One last question. Since the allele represent the single point mutation, should I use dummy coding to transform the nucleotides and then perform the multiplication with the betas or is there another approach?

Yes, if effect allele is "A" and genotype is "A A", then 2 * beta

Does anyone have any software or script that performs these calculations?

It is a one-liner, multiply genotypes with coefficients and sum them

Maybe try PRSice-2: https://www.prsice.info/

13 months ago

Hi, there's a new nextflow module, imputeme, that can do that at NF-core. It's for exactly your use case, and I believe it handles the key things asked here. I disagree that it is a "one-liner" as some comments suggests, for several reasons - a main one being that OP has an annotated vcf file, and vcf files are empty at positions that are not homozygote reference, whereas PGS catalog data does not necessarily have effect allele matched to ref and alt notations. Oh and don't let the name trick you, when inputting whole genome sequence data, no imputation takes place. That's just for the microarray based inputs. Here's the link, it should fit right into any nextflow pipeline: https://github.com/nf-core/modules/tree/master/modules/imputeme/vcftoprs