linkage disequilibrium: difference between D' and r-squared
3
3
Entering edit mode
9.1 years ago
wkreinen ▴ 60

Hello,

I have a question concerning the difference between the linkage disequilibrium measures D' and r-squared. I know the formal definitions. But I have problems understanding the different concepts behind D' and r-squared? And what does it mean if D' is low and r-squared is high (and vice versa).

Thanks wim

ld r-squared • 40k views
ADD COMMENT
3
Entering edit mode

Yes, sorry. It is not a classical bioinformatic question. Thanks for the wikipedia link. Yes, I know that stuff. Unfortunately I did not understand it and obviously I could not make it clear what I did not understand ...

In my understanding of bioinformatics it is not a fault if one tries to explain some basic conceptual differences that make a difference in the end of the day. My impression is that D' and r-squared are used (quite often) arbitrarily.

I try to be more detailed in asking my question ...

D' and r-squared are different (and popular besinde others) approaches to normalise D. D' uses the theorectical maximum of D to do the normalisation, r-squared uses correlation coefficient. So far so good ... But I have problems to understand which conditions have an influence on choosing D' or r-squared as a parameter of ld. What are the important criteria to choose D' or r-squared? Maybe, this is the wrong forum to ask a question like this. If this is the case I wish to apologise. Anyway, I would appreciate if somebody could give me a productive hint.

All the best

Wim

ADD REPLY
0
Entering edit mode

"In my understanding of bioinformatics it is not a fault if one tries to explain some basic conceptual differences that make a difference in the end of the day."

The problem is that the question isn't about bioinformatics at all. It's about population genetics. This group is for bioinformatics questions - issues specifically related to data handling, usage, and interpretation of biological datasets, not necessarily introductory population genetic theory outside the scope of a program. Also, you should consider posting a follow-up as a comment and not an answer to your question.

ADD REPLY
13
Entering edit mode
9.1 years ago
Felix Francis ▴ 600

D' : A scaled version of D

  • Ranges between -1 and +1
  • ±1 implies at least one of the observed haplotypes was not observed
  • If allele frequencies are similar, high D' means the markers are good surrogates for each other
  • D' estimates inflated in small samples (cons)
  • D' estimates inflated when one allele is rare (cons)

r2: Ranges between 0 and 1. It is the measure preferred by population geneticists

  • 1 when the two markers provide identical information.
  • 0 when they are in perfect equilibrium.
ADD COMMENT
1
Entering edit mode
9.1 years ago

Yes, I don't think this is the right place to ask those questions. However here is my answer.

D' and r2 have a big difference in that a high value of D' does not mean that one locus can predict the other with high accuracy, which in the case of say imputing SNPs could be a major issue. On the other hand, an r2 of 1 implies perfect predictability; if we know the allele at one locus we can predict perfectly the allele at the second locus and vice-versa.

A new and very good book in my opinion is The Fundamentals of Modern Statistical Genetics from Laird and Lange. You can find more info there.

ADD COMMENT
0
Entering edit mode
9.1 years ago
Brice Sarver ★ 3.8k

This isn't a bioinformatics question (and might be a homework question), but everything you want to know, including derivations and follow-through links for more in-depth explanation, is here.

ADD COMMENT

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6