Please help me understand linkage disequilibrium
Entering edit mode
2.2 years ago
? ▴ 60

I read many explanations about LD but I'm still not comfortable with the explanations.

as you can see on these videos, most of them explains as if LD is about the relation between the genotype frequency of parental cell and gametes made from it. But if so, I should have the genotype frequency of the parental cell and the frequency for all the gametes made which is impossible.

Or is it's the relation between the genotype frequency for each loci(such as 0.6 for A and 0.4 for a in one loci and 0.6 for B and 0.4 for b in another loci) of the parental population and the combined genotype frequency of the sibling population(such as 0.36 for AB , 0.16 for ab)? if so, isn't it also impossible to get the frequency of the parental population?

I suppose I'm not getting it right I'm so confused.

For my situation, if I look at the VCF file I made, I have genotypes of 0|0 1|0 0|1 1|1 for each SNP.

For simplicity, if there are two samples(A and B) and I want to see how linked two SNP positions 1,2 are, how do I find out?

let's say sample A SNP position 1 has a genotype of 0|0 and position 2 has a genotype of 1|0. and sample B SNP position 1 has a genotype of 0|1 and position 2 has a genotype of 1|1

is it possible to calculate the linked relationship? or are there other values required.

Please help me

LD Linkage disequilibrium SNP • 3.0k views
Entering edit mode
2.2 years ago

In the beginning was the Gene. As originally envisaged genes were atomic (i.e. indivisible) and inherited independently. That means if gene 1 has alleles A and a, and gene 2 has alleles B and b, then the allele you inherit for gene 1 should not depend on the allele you inherit for gene 2. But this isn't true because in physical reality genes are linked to each other on chromosomes.

In the case of a single cross, this clearly isn't true. Consider following parents:

chromosome copy 1:  ----A--------B-----
chromosome copy 2:  ----A--------B-----

Chromosome copy 1:  ----a--------b-----
Chromosome copy 2:  ----A--------B-----

Possible offspring:
----A--------B-----      or     ----A--------B-----
----A--------B-----             ----a--------b-----

Offspring from this cross will always inherit an A and a B from the mother, but 50% will inherit A and 50% a and 50% B and 50%b from the father.

Under independent inheritance, the genotypes AABB AaBB AABb and AaBb should be equally likely (you can do a punnet's square to check). But that isn't the case because the offspring either in inherits chromosome copy 1 or chormosome copy 2 from the father, so the only possible genotypes for the offspring are AABB (if the offspring inherits copy 2 from the father) and AaBb (if the offspring inherits copy 1 from the father). This is the phenomena of Linkage

All of this is assuming that recombination won't form a ------A-------b----- chromosome in the father. While this is unlikely to happen in one cross, over a population and across evolutionary time, the association of a with b and A with B will break down and you will get lots of -----A--------b---- and -----a------B---- chromosomes. when the probability of being hetrozyous at both loci is equal to the probability of being hetrozygous at gene 1 multiplied by the probability of being hetrozygous at gene 2, then the loci are said to be in linkage equilibrium.

If this is not that case, and the probability of having b rather than B at loci 1 depends on whether you have a rather than A at loci 2, then the loci are said to be in Linkage disequilibrium. At its extreme, for 2 loci in complete LD, if you tell me what the allele at loci 1 is, I can tell you what the allele at loci 2 is.

EDIT: after rereading the question.

  1. What I've drawn above are haplotypes, not genotypes. In your question you write 0|1, which suggests your data is phased. Unphased data would normally be written 1/0. So if you phased genotypes are 0|1, 0|1, then your haplotypes are 00 and 11.

  2. LD is a property of a population, not of a single individual, or even 2 individuals -you can only calculate LD if you have the genotypes of a large population. However, Linkage is not the same as linkage disequilibrium, and you can calculate Linkage from a collection of parents and their offspring (how often do you see a haplotype in the offspring that is not present in the parents). But no on really calculates linkage any more since genome sequences made genetic mapping unnecessary.

Entering edit mode

Thank you very much now I understand clearly what LD is. but in order to find LD score which is D=Pab(the ab frequency of my data) - Pa*Pb(the a and b frequency of the parental population), isn't the frequency of a(Pa) , and frequency of b(Pb) required? or is Pa and Pb also Pa and Pb of from my current VCF data since it is assumed as Mendelian population?

Entering edit mode

As you have correctly deduced, you can't calculate the extent of linkage disequilibirum from a single sample, or a small number of samples - its a property of a population, and you need 1000s, ideally 10,000s of correctly sampled indevidual genotypes to calculate it accurtely.

You need the population frequency of Pa, the populatoin frequency of Pb and the population frequency of Pab. None of these can be calcaulted from your VCF unless your VCF contains the genotypes of 1000s of indeviduals.


Login before adding your answer.

Traffic: 2158 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6