Question: Validatingm Polygenic Risk Scores from PRSice
1
gravatar for ml1
7 weeks ago by
ml110
ml110 wrote:

Hi,

Hope someone may be able to point me in the right direction!

I've generated some polygenic risk scores for schizophrenia in a cohort of individuals with borderline intellectual disability. Trying to figure out a way to validate the scores I'm getting out of PRSice. Is there a way to do this?

My current results (with --all-score) look something like this: https://ibb.co/jv9YQP5 (sorry couldn't place the table here in an easily legible format)

There are some patterns in my results, with some individuals having identical scores at particular p-values (is this just due to the stats involved in generating the scores?), and the scores are pretty low (clustering around 0) with some changing from positive to negative depending on p-value threshold. Is this typical of polygenic risk scores?

I've followed Marees et al 2018 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6001694/) and Sam Choi's tutorail (https://choishingwan.github.io/PRS-Tutorial/) to generate this, but I'm thinking I may have made errors somewhere along the line and would like to check my results somehow!

Any advice would be much appreciated!

plink genetics prs prsice • 192 views
ADD COMMENTlink modified 6 weeks ago by Sam3.0k • written 7 weeks ago by ml110
2

Mind giving more information about your sample and command you used for PRSice? If you are only giving a few SNPs to PRSice, then it is very likely for you to get zero or close to zero results (because there isn't much information). As for the positive and negative results, it is normal for PRS analysis as your beta / log(OR) are not all positive. By adding those value up, you don't necessary expect to see all positive number.

ADD REPLYlink written 7 weeks ago by Sam3.0k

Hi Sam,

Thanks for your reply- great to get advice from you!

The cohort I'm using has about 80 cases and 150 controls. I removed the controls who were related to cases, and then after other QC, I ended up with 49 cases and 56 controls. Perhaps this is just too small? It's a wholly white (Northern European) sample. I've used the PGC SCZ2 GWAS as the base data (https://www.med.unc.edu/pgc/download-results/scz/)

I've used the following command for PRSice-

Rscript PRSice.R \ --prsice PRSice_linux \ --base base.QC.gz \ --A1 a1 \ --A2 a2 \ --bp bp \ --chr hg19chrc \ --pvalue p \ --snp snpid \ --stat or \ --target X \ --keep X.QC.rel.id \ --extract X.QC.snplist \ --binary-target F \ --pheno sample_pheno \ --cov X.cov \ --cov-col @PC[1-20] \ --clump-kb 50 \ --clump-p 1 \ --clump-r2 0.2 \ --bar-levels 1e-06,0.0001,0.001,0.01,0.05,0.1,0.2,0.3,0.4,0.5,1 \ --out Xnew \ --all-score \ --fastscore \

I think you are right in that perhaps I haven't given enough SNPs to PRSice. The log file indicates there are 265 variants after clumping. Is this typical?

Thanks again for your help,

Mo

ADD REPLYlink written 6 weeks ago by ml110

Hello, I was hoping to know more about how to fix the negative beta. I think the negative beta means that the coded allele is reverse. for example: if you have some negative beta in your base GWAS data, and the ref allele is the opposite in your target population, maybe before constructing PRS, you need to fix this problem ? how can we do it. Thank you in advance

ADD REPLYlink written 6 weeks ago by Fadoua 0
1

Negative beta isn't a problem. you can view effective allele with negative beta as "protective", which is alright.

ADD REPLYlink written 6 weeks ago by Sam3.0k

As Sam states - a negative value would suggest a protective score. I'd just been querying likely validity in scores which move from increasing risk to being protective at differing p-values (for the same individual). I think I just don't have enough data for it to be meaningful.

If you know that your GWAS and target datasets have reference allele in different columns, I think you can specify the headers with the --A1 and --A2

ADD REPLYlink written 6 weeks ago by ml110
2
gravatar for Sam
6 weeks ago by
Sam3.0k
New York
Sam3.0k wrote:

Your sample size is simply too small. That will usually not give you any reasonable result. I suspect that as you've less than 100 samples, the clumping LD varies quite a lot, lead to more variants being removed than normally would (which lead to the small variant count). One way to get around that will be to use a reference panel (e.g. 1000 genome) which has a larger sample size. Though at the end of the day, with only 100 samples, I don't expect you to get any meaningful results.

Given there are only 265 variants after clumping, it is very possible that you'd only have one or two SNP included in a particular threshold, which can then lead to the identical PRS. So I am not too surprise about your results.

ADD COMMENTlink written 6 weeks ago by Sam3.0k

I see! Reassuring to know there aren't any obvious errors at least.

I used the 1K genome to show the limited population stratification in the sample - is utilising a reference panel similar? Just for my learning, could you point me to any resource about using --ld option? I'm not sure how to generate the required file.

It sounds like there isn't really any way to get meaningful data from my sample set due to its size. Presumably there is no way of utilising the genetic data from about 50 related controls which I removed?

ADD REPLYlink written 6 weeks ago by ml110

The 1000 Genome data can be used as the reference. The usage and requirement for the --ld flag is exactly the same as the --target flag. You will need the plink or bgen file.

Unfortunately, yes, with your sample size, you will need to think of something clever to have a hope of getting anything meaningful out of PRS due to the lack of power.

ADD REPLYlink written 6 weeks ago by Sam3.0k

Thanks Sam. Just to get some final advice - if I did manage to get a bigger sample, how would I validate the results I get out of PRSice if I'm unable to do out-of-sample prediction or cross-validation?

ADD REPLYlink written 6 weeks ago by ml110
1

You can use the --perm option, which will generate an empirical p-value for you (preferably --perm 10000 or above). If empirical p-value is less than 0.05, then you are good to go. (but do note that the R2 is still inflated as you are not doing out of sample validation)

ADD REPLYlink written 6 weeks ago by Sam3.0k

thanks for your help!

ADD REPLYlink written 6 weeks ago by ml110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 723 users visited in the last hour