Hi everyone, I have many questions about polygenic risk scores. I have created them once using the traditional PLINK method, but I recently just came across PRSice, which looks WAY easier than PLINK where I had to do all of the data cleaning steps manually. I've always been confused with what steps need to be done with the discovery (base) and the target (replication) samples (e.g., data cleaning for both?), and it looks like PRSice would eliminate this confusion for me, as it is done all in one step? Is anyone familiar with this method? Am I understanding it correctly? And how does PRSice compare to LDpred? What are the pros and cons for these different methods?
I'm also wondering about the phenotypes - how similar does the phenotype for the discovery sample need to be to the phenotypes for the target sample? Do they have to match perfectly or can they just be related phenotypes?
Thanks in advance!!