linkage disequilibrium: difference between D' and r-squared
3
3
Entering edit mode
8.3 years ago
wkreinen ▴ 60

Hello,

I have a question concerning the difference between the linkage disequilibrium measures D' and r-squared. I know the formal definitions. But I have problems understanding the different concepts behind D' and r-squared? And what does it mean if D' is low and r-squared is high (and vice versa).

Thanks wim

ld r-squared • 37k views
3
Entering edit mode

Yes, sorry. It is not a classical bioinformatic question. Thanks for the wikipedia link. Yes, I know that stuff. Unfortunately I did not understand it and obviously I could not make it clear what I did not understand ...

In my understanding of bioinformatics it is not a fault if one tries to explain some basic conceptual differences that make a difference in the end of the day. My impression is that D' and r-squared are used (quite often) arbitrarily.

I try to be more detailed in asking my question ...

D' and r-squared are different (and popular besinde others) approaches to normalise D. D' uses the theorectical maximum of D to do the normalisation, r-squared uses correlation coefficient. So far so good ... But I have problems to understand which conditions have an influence on choosing D' or r-squared as a parameter of ld. What are the important criteria to choose D' or r-squared? Maybe, this is the wrong forum to ask a question like this. If this is the case I wish to apologise. Anyway, I would appreciate if somebody could give me a productive hint.

All the best

Wim

0
Entering edit mode

"In my understanding of bioinformatics it is not a fault if one tries to explain some basic conceptual differences that make a difference in the end of the day."

The problem is that the question isn't about bioinformatics at all. It's about population genetics. This group is for bioinformatics questions - issues specifically related to data handling, usage, and interpretation of biological datasets, not necessarily introductory population genetic theory outside the scope of a program. Also, you should consider posting a follow-up as a comment and not an answer to your question.

12
Entering edit mode
8.3 years ago
Felix Francis ▴ 580

D' : A scaled version of D

• Ranges between -1 and +1
• ±1 implies at least one of the observed haplotypes was not observed
• If allele frequencies are similar, high D' means the markers are good surrogates for each other
• D' estimates inflated in small samples (cons)
• D' estimates inflated when one allele is rare (cons)

r2: Ranges between 0 and 1. It is the measure preferred by population geneticists

• 1 when the two markers provide identical information.
• 0 when they are in perfect equilibrium.
1
Entering edit mode
8.3 years ago

Yes, I don't think this is the right place to ask those questions. However here is my answer.

D' and r2 have a big difference in that a high value of D' does not mean that one locus can predict the other with high accuracy, which in the case of say imputing SNPs could be a major issue. On the other hand, an r2 of 1 implies perfect predictability; if we know the allele at one locus we can predict perfectly the allele at the second locus and vice-versa.

A new and very good book in my opinion is The Fundamentals of Modern Statistical Genetics from Laird and Lange. You can find more info there.

0
Entering edit mode
8.3 years ago
Brice Sarver ★ 3.7k

This isn't a bioinformatics question (and might be a homework question), but everything you want to know, including derivations and follow-through links for more in-depth explanation, is here.