It is a common task in research papers to "control for Linkage disequilibrium". Let's use GWAS as an example. When are two SNPs considered independent? I've seen an R^2 value of 0.2 used quite often to show that a pair of SNPs is independent. This doesn't make sense. Here is why:

Let's say we have 3 SNPs sorted in terms of their position on the genome. First you have SNP1, SNP2, then SNP3. Consider the following case:

  • R^2 between SNP1 and SNP2 is 0.18
  • R^2 between SNP1 and SNP3 is 0.23

Are we to conclude that SNP1 and SNP3 linked, but the SNP1 and SNP2 are NOT linked even though SNP2 is in between SNP1 and SNP3? Is it necessary to look for the last SNP such that R^2 between SNP1 and SNPX is less than 0.2 and then proceed to remove all SNPS in between SNP1 and SNPX? If so, wouldn't this be overly conservative considering that we can see an R^2 of 0.2 over many millions of base pairs?

