r2 correlation interpretation snp in plink pruning
1
1
Entering edit mode
7.9 years ago
Floris Brenk ★ 1.0k

Hi all,

Plink has the function "Linkage disequilibrium based SNP pruning" which is --indep 50 5 2 where the 2 stands for the vif threshold (VIF is 1/(1-R^2)) which means in this case r2 = 0,50.

So Linkage disequilibrium is the non-random association of alleles. I'm a bit strugling what for example an r2 of 0.5 means and how plink calculates this. Does 0.5 just mean a correlation of 0.5 between two snps? Can anyone explain to me a bit more what this 0.5 actually mean in real numbers. For example when I have 100 samples how many snps need to be in perfect LD to reach a r2 of 0.5?

r2 plink SNP pruning • 7.7k views
6
Entering edit mode
7.9 years ago

It's the squared correlation coefficient between the 0/1/2 allele counts. I.e. ((Cov(marker 1 allele counts, marker 2 allele counts))^2) / (Var(marker 1 allele counts) * Var(marker 2 allele counts)).

The "number of SNPs in perfect LD" required depends on the exact distribution of allele counts for each marker; for example, if both markers have empirical MAF 0.01, even 99 samples in "perfect LD" is not enough to guarantee r^2 >= 0.5.