Question: Conditioning association summary statistics using a correlation matrix.
I am conducting a meta-analysis of genetic data, and I am also fine-mapping a few loci to try to determine which variants are the pathogenic or "causal" variants.

In some of these loci, there have been studies done demonstrating at least one pathogenic variant, but in several of these cases there are unlinked variants that are also very strongly associated, making me expect there may be additional pathogenic variants to identify.

What I would like to do is condition the other variants' association summary statistics on that one. However, I do not have the raw genotyping data, I only have Z-statistics, p-values, and a LD matrix (matrix of signed correlation coefficients).

I have seen a lot of imputation methods, fine-mapping methods, and other methods in recent years that suggest to me that it should be possible to condition using correlation coefficients and p-values. For instance, there is this manuscript, but it answers a slightly different question as it does not relate to conditioning.

Can anyone provide either a reference, or an expression for how to condition the association summary statistics (either Z-statistic or 1 sided pvalue preferred) of linked genetic variants on the association summary statistic of the known pathogenic variant?


This looks like you could use a partial least squares approach, considering a kernelized version with the correlation matrix as the kernel.

@Jean-Karim - Thank you for your post. I will reflect on this. I will conduct a search on my own. However, if you have any manuscripts or even textbooks in mind, I will be sure to take a look at those too.

Note, I posted a more technical (mathematical) description of the goal here:

