Entering edit mode

14 months ago

wmei1997
•
0

I've simulated a testing genotype and phenotype dataset. I've used the betas in the simulated summary statistics to calculate the sign for the correlation matrix. I did not use a reference panel, after running LASSOSUM on my dataset. I got the correlation coefficient to be negative from the best.validation.result. However, it does not make sense. What could lead to the negative correlation coefficient to be less than zero? How can I simulate my genotype and phenotype such that there will be a positive correlation.

How do you simulate your phenotype? And have you separated out your base and target? i.e. use the same sample for beta estimation and lassosum?

I simulated both genotype and phenotype using phenotypesimulator. We simulated our summary statistics from GWAS data. The effect is the estimated betas from GWAS. P is the p-value of the variant. Then I mapped the betas to z-score, and then got the p-value of the z-scores. （I'm not sure if that's correct） For lassosum, we calculated the sign of our correlation using the sign of the estimated betas. However, no matter how we simulate , we would get a negative correlation.

I have never tried to use phenotypesimulator and will have to read their paper to know how they simulate teh phenotype. The easiest way is usually simulate phenotype based on a genotype data (UK Biobank is good for that due to the sample size) using program such as gcta.

If you use the same input (summary statistic and target genotype) but with a different program (e.g. PRSice or LDPred), then do a manual correlation between the PRS and phenotype, do you get a negative result (PRSice generate R2, and to my knowledge, LDPred don't auto generate final correlation).