How to calculate SNP LD values from haplotype data
5.9 years ago

I have a haplotype matrix determined from a genotype data of bi-allelic SNP.

The genotype to haplotype "conversion" was done considering the algorithm in this link.

The SNPs were converted into the 0,1 and 2 representation such that there is a 0 or a 1 at the polymorphic site of the genotype if the two chromosomes contain the same base at this site and a 2 if the bases on this site differ. All these {0,1,2} sequences were stored in a genotype matrix.

The haplotype matrix is composed of {0,1} sequences e was produced in order to contain, for every sequence in the genotype matrix, two haplotype sequences that are compatible* with the genotype sequence.

*Two sequences a and b are compatible with a sequence c if for every site i the following holds:

if a(i) = b(i) then a(i) = b(i) = c(i)

if a(i) <> b(i) then c(i) = 2

The genotype matrix is a matrix where the columns represent the SNPs and the lines represent the genotype sequences. The haplotype matrix is a matrix where the columns represent the SNPs and the lines represent the haplotype sequences (each genotype sequence from the genotype matrix produced two haplotype sequence).

For example:

enter image description here

I need to calculate the Linkage Disequilibrium from these data using the r² metric. To do that, I need to calculate the haplotype and allelic frequencies. Unfortunetelly, I do not know how to do that.

How to calculate Linkage Disequilibrium from this haplotype matrix using r² metric?

How to estimate the haplotype and allelic frequencies used in the r² calculation?

