Question

Discrepancy Between Ld And R-Sq

5

Entering edit mode

12.0 years ago

Motor Genetic ▴ 110

By increasing the number of snps in arrays, I noticed there are many discrepancy between LD and r-square correlations among nearby snps...I know r-sq is supposedly independent of minor allele frequency, but this seems to be not the only reason...does anyone have any clue?!...thank you

• 14k views

ADD COMMENT • link updated 12.0 years ago by Pablo Marin-Garcia ★ 2.0k • written 12.0 years ago by Motor Genetic ▴ 110

1

Entering edit mode

Please clarify your question. R-squared is a measure of LD, so how is there a discrepancy? There is variation in recombination across the genome causing changes in LD.

ADD REPLY • link 12.0 years ago by Zev.Kronenberg 12k

2

Entering edit mode

hi zev...my question was about the differences between LD using D prime and LD using r-sq correlation between two snpss...there are many snp pairs with high D prime >0.90 but very low correlation r2<0.05 between them. this is becoming more and more an issue...as the number of snps in arrays are growing...I hope it is more clear now...thanks

ADD REPLY • link 12.0 years ago by Motor Genetic ▴ 110

score 8 · Answer 1 · 2012-04-11

LD is a very very blurry concept and the ways of measure it is very diverse. If the two loci have very different minor allele frequency you would find differences between D' and r-sq due the way the D is normalized (by Dmax in D' or the product of all allele freq in r-sq) see formulas below

In the following paper you would be able to gain a deeper view of the problem:

Slatkin M. Linkage disequilibrium--understanding the evolutionary past and mapping the medical future. Nat Rev Genet. 2008 Jun;9(6):477-85. Review. PubMed PMID: 18427557.

Just in case you don't have access here you have the starting paragraph rant:

Linkage disequilibrium (LD) is one of those unfortunate
terms that does not reveal its meaning. As every
instructor of population genetics knows, the term is a
barrier not an aid to understanding. LD means simply
a nonrandom association of alleles at two or more loci,
and detecting LD does not ensure either linkage or a
lack of equilibrium. The term was first used in 1960 by
Lewontin and Kojima1 and it persists because LD was
initially the concern of population geneticists who were
not picky about terminology as long as the mathematical
definition was clear. At first, there were few data with
which to study LD, and its importance to evolutionary
biology and human genetics was unrecognized outside
of population genetics. However, interest in LD grew
rapidly in the 1980s once the usefulness of LD for gene
mapping became evident and large-scale surveys of
closely linked loci became feasible. By then, the term
was too well established to be replaced.

And here some images of the calculations:

enter image description here

and from wikipedia: enter image description here