plink calculating LD using R2 command
1
0
Entering edit mode
5.2 years ago
janhuang.cn ▴ 190

I am calculating the R2 of SNPs from 1000G phase 3 vcf files.

I first converted vcf to bed, and used R2 command. But I found some discrepancy of my results from the results on LD link, an online tool to calculate LD based on 1000G phase 3. The reason I calculated myself is I need a full LD table for all available SNPs from 1000G.

Both rs138281 and rs138285 are in the bed file, but they are not in LD. No R2 was returned by R2 command. I did not set a R2 cutoff, so it should returl all R2.

plink --bfile chr22_1000Gphase3_EUR_snp_maf_rmvsnp --r2 --out chr22_1000Gphase3_EUR_ldtable


However, they are highly correlated according to LD link (R2=0.9413 among EUR population)

Does anyone have an idea what might go wrong?

ld plink r2 • 8.4k views
1
Entering edit mode
5.2 years ago
pfs ▴ 280

The default output is the limited table format. Since you did not request a different format this should be the one you are getting. Using the default the LD will only be calculated for variants in close proximity (10-1). I am guessing you have more than 9 other SNPs between these two SNPs so LD for these two SNPs are not being calculated.

Below is copy paste from Plink documentation

By default, --r calculates and reports raw inter-variant allele count correlations, while --r2 reports squared correlations. You can request values for all pairs in matrix format (if you specify 'bin' and/or one of the matrix shape modifiers), all pairs in table format (with 'inter-chr'), or a limited window in table format (this is the default). Results are saved to plink.ld{.gz/.bin}.

By default, when a limited window report is requested, every pair of variants with at least (10-1) variants between them, or more than 1000 kilobases apart, is ignored. You can change the first threshold with --ld-window, and the second threshold with --ld-window-kb.

0
Entering edit mode

Thank you. That is the reason.