Correlation of PRS obtained by plink and PRSice
2.4 years ago

I want to know why the correlation of PRS obtained by plink and PRSice is low. Firstly, I used plink --clump. The parameters are same as PRSice process. plink --bfile HCP --clump gwas --clump-p1 0.00145 --clump-r2 0.1 --clump-kb 250 Then, I used the clumped snp (include OR and effect allele) as input to compute score: plink --bfile HCP --score gwas_0.00145P_clumped 2 4 6 I obtained the PRS of every subjects by plink.

PRSice calculations follow the manual strictly.

Then I checked the correlation between PRS from plink and PRSice.Pearson correlation coefficient is 0.49 ,p =0.

2.4 years ago
Sam ★ 4.4k

Which version of PRSice did you use? Also, for PRSice, we perform SNP filtering, flipping and matching before doing the clumping. That could have contributed to the low correlation. If you want to manually reproduce the PRSice workflow with PLINK, you can follow our tutorial here

Hi,Sam. Thank you for your reply. I want to ask another question. Before we calculate the PRS by PRSice2, we need to confirm two data set have the same variants and effect allele.

Both in base and target data, the name of variants are the same, but the effect allele are different.

For example:

In target data: 1 rs181193408 83084 A T In base data: 1 rs181193408 83084 T A

Some snps have different effect allele as above, and some have the same effect allele.

Should I change the "A T" to "T A" of these different effect allele of snps and don't change the position of allele of snp which have the same effect allele in target data?

As long as the encoding is consistent, than PRSice should be able to handle them correctly.

2.4 years ago
Yean ▴ 140

In your case, I guess that the low correlation between two genetic risk score can be occurred from the followings.

1. SNPs in summary statistics calculated in PLINK and PRSice are different from each other because the different threshold was used between PLINK and PRSice when performing LD clumping in summary statistics as an example.

2. If not, Have you checked the default formula of genetic risk score between PLINK and PRScise. Are they using the same formula when calculating genetic risk score by default as well as when imputing missing genotype ?. I guess that these probably contribute to low correlation.

Hope this help