I am reading the new method of PRS - BridgePRS (https://www.biorxiv.org/content/10.1101/2023.02.17.528938v1), this is a very interesting paper regarding the transferability of PRS. I am trying to study about the methods (my statistics and mathematics background is quite weak), but one thing I don't understand is that why ranking the SNPs in both stages? What is the rationale behind it? If the Bridge regression is being used, why just including all the variants? Any thoughts is appreciated.
Hi Mengna,
Thanks for your interest in BridgePRS. The idea behind using different ranking metrics in the two stages is that each stage contains different information. So in stage 1 we use just the information from a single population, ie minimum p-value for the locus and in stage 2 we combine information from two populations using the pseudo F-statistic to rank loci. Rather than including all loci in the PRS, BridgePRS applies many p-value and pseudo F-statistic locus inclusion criteria which each give their own PRS. These PRS are then optimally combined in a ridge regression fit using test data.
I hope this helps
Clive
Thank you so much, Clive for the clear explanation!