Hi all,

I am wondering how PLINK is calculating polygenic risk scores (PRS) from base GWAS data. Specifically, I am unclear on how allelic scoring is performed, because it seems to depend on which allele is chosen as the effect allele. Therefore, my question pertains to how PLINK chooses the effect and non effect alleles.

Say we have SNP1 with alleles C and T. The ORt = 3. The ORc = 1/3.

In this case we can approach it in two ways: T is the effect allele and C is the non-effect allele. This means that T increases risk for disease, and that if we compare to the homozygote of the non-effect allele, C, genotype CC = 1, genotype TC = 3, and genotype TT = 9. Therefore if we score using ln(OR), the three possible betas are: CC = ln(1), TC = ln(3), or TT = 2*ln(3).

In the other case we can approach it as follows: C is the effect allele and T is the non-effect allele. This means that C decreases risk for disease, and that if we compare to the homozygote of the non-effect allele, T, genotype TT = 1, genotype TC = 1/3, and genotype TT = 1/9. Therefore if we score using ln(OR), the three possible betas are TT = ln(1), TC = ln(1/3), or CC = 2*ln(1/3).

In these cases then, we will get the same total number, but with opposite signs for scoring depending on which allele we choose as the effect allele. Therefore, when scoring, if we are summing betas for each SNP, if we choose the effect allele to always increase risk, the number will be a very large positive number, and if we choose the effect allele to always decrease risk, the number will be a very large negative number.

So my question is, when scoring, how do we choose the effect and non-effect allele?

Thanks :)