Question

Linkage Disequilibrium to Rare SNPs

2

Entering edit mode

9.2 years ago

Shicheng Guo ★ 9.6k

Hi colleagues,

Just very simple question. When we calculate LD, we need P(A),P(a),P(B),P(b) and then

D=P(AB)-P(A)P(B)
r=P(AB)-P(A)P(B)/sqrt(P(A)P(a)P(B)P(b))

However, for raw SNPs or mutation, P(A) or P(B) might be 0. In such situation, r can not be calculated. Is there any compromised way to calculate r or D' for such situation?

In another way, suppose I only observe 1 haplotype for 2 loci in all the samples (very large samples)? Can I take the such as as completed linkage or I can take record LD as 'NA'?

Thanks

LD Linkage Disequilibrium • 2.1k views

ADD COMMENT • link updated 8.7 years ago by Biostar 20 • written 9.2 years ago by Shicheng Guo ★ 9.6k

score 1 · Answer 1 · 2016-04-23

Technically, if the frequency of one of the alleles is zero, one of your loci is not polymorphic, and it makes no sense compute LD for those two loci. So, NA is appropriate. Do not take 1 as a value, since that would be a mistake. Considering that LD expresses the difference between the observed haplotype frequencies and those expected based on allele frequencies, you can realize that when you miss one allele at one locus (i.e. one locus is not polymorphic) by definition you always have that the haplotype frequencies are exactly those expected based on the allele frequencies, i.e. you don't have any information to test if there is non-random association of alleles in the two loci and again, NA is an appropriate answer.