Question: Calculate r^2 from allele frequencies
0
gravatar for Adrian Pelin
6.4 years ago by
Adrian Pelin2.4k
Canada
Adrian Pelin2.4k wrote:

Hello,

I have VCF files and would like to calculate LD r^2 values from allele frequencies since my data is not phased.

VCFTools doesn't seem to be working, it is reporting "-nan" values. Can anyone suggest other tools to do this?

r2 vcf • 2.7k views
ADD COMMENTlink modified 6.4 years ago by chrchang5237.4k • written 6.4 years ago by Adrian Pelin2.4k
1

Maybe my intuition is wrong, but I think one cannot compute LD from the allele frequencies alone.

ADD REPLYlink written 6.4 years ago by Michael Dondrup48k

Michael Dondrup and chrchang523 are correct.  Allele frequency alone is NOT sufficient to calculate LD.  If a program claims to only use allele counts it is using a phasing algorithm. 

ADD REPLYlink written 6.2 years ago by Zev.Kronenberg11k

Found an interesting article here:

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0048588#s3

What confuses me about it, is that although the authors say the program calculates r^2 values, their approach suggests the computation of a different parameter. In their case they only calculate LD for pairs of bi-allelic SNPs close enough to be both present in one read. This causes their calculation to be based on haplotype frequencies. On the other hand, they are introducing an ML approach where allele frequency information is integrated in order to better estimate r^2, so that makes sense if I understand it right.

One disadvantage of this approach is that it only looks for pairs of SNPs very close to one another, and cannot calculate r^2 decay along a chromosome. I am still trying to grasp the basics of r^2 and what are the caveats in using it as opposed to measure D.

ADD REPLYlink written 6.4 years ago by Adrian Pelin2.4k
1
gravatar for chrchang523
6.4 years ago by
chrchang5237.4k
United States
chrchang5237.4k wrote:

plink --vcf [VCF file] --r2 [other options, depending on what output format you want]

See https://www.cog-genomics.org/plink2/ld#r for details.  The baseline computation simply looks at allele count correlations, while '--r2 dprime' uses results from basic pairwise ML phasing (which is not competitive with the likes of SHAPEIT, but you might find it to be better than nothing).

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by chrchang5237.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1828 users visited in the last hour