Question

Question about the correlation coefficient for Linkage Disequilibrium (LD)

1

Entering edit mode

4.2 years ago

userabc ▴ 10

For the sake of notation, I'll write underscores as parantehsis, i.e. P_AB = P(AB), P_A = P(A), etc.

So the correlation coefficient depends on choice of "target alleles".

For example, for the correlation coefficient, you have:

r(AB) = [P(AB) - P(A)P(B)] / [sqrt((P(A)P(B)(P(a)P(b))]

and

r(Ab) = [P(Ab) - P(A)P(b)] / [sqrt((P(A)P(B)(P(a)P(b))]

The denominator is the same, but the numerator changed.

As far as I know, P(A), P(B), P(a), P(b) etc are all allele frequencies. P(AB), P(aB), P(Ab), P(ab) are haplotype frequencies.

When you have independence, which also means no LD (I think?). Then P(AB) = P(A)P(B). When you don't have independence on the other hand, you have:

P(AB) = P(A)P(B) +D,

P(Ab) = P(A)P(b) -D,

P(aB) = P(a)P(B) -D,

P(ab) =P(a)P(b) +D.

Ok so this is actually where the question starts. So in the book it says r^2 is zero if and only if D is zero. That makes sense, because if the D is zero, the numerator is zero (since you could rewrite the numerator as P(AB)-P(A)P(B) = D. But then it says r^2 = 1 only if either P(AB) =P(ab) = 0 or P(Ab) = P(aB) = 0.

How do I go about showing that? I've been trying to just plug in different stuff, but I'm not sure if that's how you're supposed to do it?

Example of what I've been trying to do. I'll try to get r = 1 since that would imply r^2 = 1.

Assume P(AB)=P(ab) = 0.

This immediately makes the numerator -P(A)P(B). And I can rewrite that to D since P(AB)=0.

As far as the denominator goes, I group them as pairs. So in the denominator I currently have the square root of P(A)P(B)P(a)P(b). So I group them as pairs and rewrite P(A)P(B) = (P(AB) - D) and P(a)P(b) = (P(ab) -D). I now currently have square root of (P(AB) - D)(P(ab) -D). From the assumptions, P(AB)=P(ab)=0. And I end up getting the square root of D^2 in the denominator, ie the fraction I end up getting for the correlation coefficient is D/D which is 1. Is this argument correct?

One thing that confused me though, which makes me think this argument doesn't hold. Is how do I even get to -1 in this circumstance? r ranges from -1 to 1 and r^2 ranges from 0 to 1. Somewhere I read that r(aB) = -r(AB). How would I go about showing that? Either way, so that would imply whenever r(AB) = 1, then r(aB) = -1. So I tried a similar argument to what I did above, but on r(aB) instead, and using either P(AB)=P(ab) = 0 or P(Ab) = P(aB) = 0, but couldn't get it right....

Any suggestions?

SNP • 653 views

ADD COMMENT • link updated 4.2 years ago by Fabio Marroni ★ 3.0k • written 4.2 years ago by userabc ▴ 10

score 1 · Answer 1 · 2020-02-29

If P(Ab)=P(aB)=0, then, by definition P(A)=P(B) and p(a)=p(b) because alleles A and B are only found in the AB haplotype, and a and b alleles are only found in the ab haplotype.

Thus -P(A)P(b)] / [sqrt((P(A)P(B)(P(a)P(b))]

becomes

-P(A)P(a)/ [sqrt((P(A)P(A)(P(a)P(a))]

i.e.

-P(A)P(a)/[sqrt((P(A)^2)(P(a)^2))]

and you can easily see that this is always -1, and when you square it, it is always 1.