Polygenic Risk Score Calculation: Do we need to apply the same p-value threshold on all 22 chromsomes?
1
0
Entering edit mode
2.3 years ago
Mengna Zhang ▴ 10

Hi there,

I am using PRSice-2 to calculate the polygenic risk score for 22 chromosomes one by one. To my understanding, since the 22 chromosomes are independent of each other, and our goal is to find the best set of SNPs for chromosomes and then merge them as the final best "representative" SNPs for the phenotype. Thus, we can use different C+T thresholds on different chromosomes to achieve our goal, right?

For example, my pheno type is height, on chromosome 1, PRSice-2 gave me the best set of SNPs (snp1, snp2, snp3) with C+T threshold: r2 = 0.01, p-value threshold = 0.001;
on chrmosome 2, PRSice-2 gave me the best set of SNP (snp4, snp5) with C+T threshold: r2 = 0.01, p-value threshold = 0.01;

Can I then report that snp1,snp2,snp3,snp4, and snp5 are associated with height? Do I need to apply the same C+T threshold on every chromosome?

PRS PRSice-2 • 1.7k views
ADD COMMENT
3
Entering edit mode
2.3 years ago
Sam ★ 4.7k

While it is faster to do the per-chromosome calculation, there are a number of gotcha that might invalidate your results. Perhaps the #1 problem of this approach is that PRSice's default is --score avg, which divides the PRS by the number of allele used for calculating the PRS (which helps to account for individual genotype missingness), as such you cannot reliably add up the individual PRS to generate a genome wide score. For that you need to use --score sum which can be affected by individual genotype missingness.

Once you have handled that, you can use --all-scores to generate PRS for all samples for all p-value thresholds for all chromosome. And then you can add up the PRS for each p-value threshold, and perform the required regression to identify the best threshold.

The main problem of this per-chromosome approach is that while you do speed up the analysis by parallelize across chromosome, it significantly increase the potential of having an error. In fact, in the latest version of PRSice-2 (v 2.3.5), if you can use multi-threading, I don't think doing by chromosome then merge will give you any speed advantage over doing it with PRSice directly unless you are doing imputed data, which because of a bug, we might require way more memory than possible when performing clumping.

Hope this help

ADD COMMENT
0
Entering edit mode

Thank you, Sam! So what you said "And then you can add up the PRS for each p-value threshold, and perform the required regression to identify the best threshold." means that the best p-value threshold must be the same for all chromosomes?
I did use --score sum to calculate the PRS. I chose to calculate the PRS by chromosome is because the data I was given was organized by chromosome and the data size is very big. With --score sum, why can't I apply different p-value thresholds for different chromosomes? Since they are independent of each other, can I merge PRS on chr1 with p-value threshold 0.000001 and PRS on chr2 with p-value threshold 0.001 together?

ADD REPLY
2
Entering edit mode

In theory, you can do that, but the interpretation will be more difficult. Natively, PRSice support per-chromosome input: --target chr# if your input are organized in chr1, chr2 etc.

ADD REPLY

Login before adding your answer.

Traffic: 1819 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6