Question: formula to calculate LD in plink
0
maxixian19900 wrote:

Could any kind of person tell me the formula to calculate Linkage disequilibrium in the plink. The results calculated by plink is different from that calculated by the script I wrote which uses the R2 = (ad - bc)^2/((a + b)(a+c)(c+d)(b+d)). Thank you.

modified 5.8 years ago by chrchang5237.1k • written 5.8 years ago by maxixian19900
1
chrchang5237.1k wrote:

See the "correlation coefficient" definition under http://en.wikipedia.org/wiki/Linkage_disequilibrium#Definition , and the discussion at http://pngu.mgh.harvard.edu/~purcell/plink/ld.shtml#ld2 .  The basic r^2 computation involves correlation between the 0/1/2 allele counts instead of haplotype frequencies, but you can also tell plink to estimate haplotype frequencies and use the standard formula on them (results will rarely differ by much).

1

@chrchang523 Please, could you provide a numerical example? How PLINK recode snp (two columns) into one numeric value? Is 11=0; 22=2; 12=21=1? And then, how correlation is calculated? Thank you for your help!

EDIT: I think I found how plink caculates de r2. The program counts the number of copies of the allele with the minor freq in each SNP and then calculates de correlation for this count:

```snp11 snp12   snp21 snp22  counts_in_snp1   counts_in_snp2
1     2       2     2          1                0
2     2       2     2          0                0
1     1       1     2          2                1
1     2       2     2          1                0```

....

And then, correlation between counts_in_snp1 and counts_in_snp2.