Question: r2 correlation interpretation snp in plink pruning
1
Floris Brenk970 wrote:

Hi all,

Plink has the function "Linkage disequilibrium based SNP pruning" which is --indep 50 5 2 where the 2 stands for the vif threshold (VIF is 1/(1-R^2)) which means in this case r2 = 0,50.

So Linkage disequilibrium is the non-random association of alleles. I'm a bit strugling what for example an r2 of 0.5 means and how plink calculates this. Does 0.5 just mean a correlation of 0.5 between two snps? Can anyone explain to me a bit more what this 0.5 actually mean in real numbers. For example when I have 100 samples how many snps need to be in perfect LD to reach a r2 of 0.5?

pruning r2 plink snp • 6.7k views
modified 6.6 years ago by chrchang5237.6k • written 6.6 years ago by Floris Brenk970
4
chrchang5237.6k wrote:

It's the squared correlation coefficient between the 0/1/2 allele counts.  I.e. ((Cov(marker 1 allele counts, marker 2 allele counts))^2) / (Var(marker 1 allele counts) * Var(marker 2 allele counts)).

The "number of SNPs in perfect LD" required depends on the exact distribution of allele counts for each marker; for example, if both markers have empirical MAF 0.01, even 99 samples in "perfect LD" is not enough to guarantee r^2 >= 0.5.