Calculate r^2 from allele frequencies
1
0
Entering edit mode
6.9 years ago

Hello,

I have VCF files and would like to calculate LD r^2 values from allele frequencies since my data is not phased.

VCFTools doesn't seem to be working, it is reporting "-nan" values. Can anyone suggest other tools to do this?

r2 vcf • 2.9k views
1
Entering edit mode

Maybe my intuition is wrong, but I think one cannot compute LD from the allele frequencies alone.

0
Entering edit mode

Michael Dondrup and chrchang523 are correct.  Allele frequency alone is NOT sufficient to calculate LD.  If a program claims to only use allele counts it is using a phasing algorithm.

0
Entering edit mode

Found an interesting article here:

What confuses me about it, is that although the authors say the program calculates r^2 values, their approach suggests the computation of a different parameter. In their case they only calculate LD for pairs of bi-allelic SNPs close enough to be both present in one read. This causes their calculation to be based on haplotype frequencies. On the other hand, they are introducing an ML approach where allele frequency information is integrated in order to better estimate r^2, so that makes sense if I understand it right.

One disadvantage of this approach is that it only looks for pairs of SNPs very close to one another, and cannot calculate r^2 decay along a chromosome. I am still trying to grasp the basics of r^2 and what are the caveats in using it as opposed to measure D.

1
Entering edit mode
6.9 years ago

plink --vcf [VCF file] --r2 [other options, depending on what output format you want]

See https://www.cog-genomics.org/plink2/ld#r for details.  The baseline computation simply looks at allele count correlations, while '--r2 dprime' uses results from basic pairwise ML phasing (which is not competitive with the likes of SHAPEIT, but you might find it to be better than nothing).