I have a haplotype matrix determined from a genotype data of bi-allelic SNP.
The genotype to haplotype "conversion" was done considering the algorithm in this link.
The SNPs were converted into the 0,1 and 2 representation such that there is a 0 or a 1 at the polymorphic site of the genotype if the two chromosomes contain the same base at this site and a 2 if the bases on this site differ. All these {0,1,2} sequences were stored in a genotype matrix.
The haplotype matrix is composed of {0,1} sequences e was produced in order to contain, for every sequence in the genotype matrix, two haplotype sequences that are compatible* with the genotype sequence.
*Two sequences a and b are compatible with a sequence c if for every site i the following holds:
if a(i) = b(i) then a(i) = b(i) = c(i)
if a(i) <> b(i) then c(i) = 2
The genotype matrix is a matrix where the columns represent the SNPs and the lines represent the genotype sequences. The haplotype matrix is a matrix where the columns represent the SNPs and the lines represent the haplotype sequences (each genotype sequence from the genotype matrix produced two haplotype sequence).
For example:
I need to calculate the Linkage Disequilibrium from these data using the r² metric. To do that, I need to calculate the haplotype and allelic frequencies. Unfortunetelly, I do not know how to do that.
How to calculate Linkage Disequilibrium from this haplotype matrix using r² metric?
How to estimate the haplotype and allelic frequencies used in the r² calculation?