VCFtools LD calculations only for pairs that include a single SNP of interest
2
1
Entering edit mode
7.0 years ago
Scott ▴ 90

I would like to calculate LD statistics for a VCF file using VCFtools. For the 1Mb window I am interested in, even using a R^2 minimum of 0.2, it is taking quite a bit of time to calculate these statistics.

I am ultimately only interested in LD statistics that include a single SNP of interest. Is there a way to have VCFtools compute LD stats only for pair-wise comparisons that include my SNP of interest, but still over the whole 1Mb region?

My understanding is that the "ld-window" options can define only the entire region to use, so they are not useful for this application.

SNP LD R^2 linkagedisequilibrium VCFtools • 5.3k views
2
Entering edit mode
7.0 years ago

The yet-to-be-released version of vcftools (in the SVN) has a new option that allows you to do something like this. The option is called --hap-r2-positions, and allows you to specify a list of sites to be tested against all other sites for LD. To use the option, you just use --hap-r2-positions <positions_filename>.

0
Entering edit mode

That's useful, and means the conversion to plink is now unnecessary, cheers.

1
Entering edit mode
7.0 years ago
smilefreak ▴ 420

Hi Scott,

Plink has this functionality, you could use VCFtools to extract your region of interest and convert to the plink format.

Then use Plink for the ld calculations for your SNP, the command would look similar to this one below, which I copied from the plink documentation.

plink --file mydata
--r2
--ld-snp rs12345
--ld-window-kb 1000
--ld-window 99999
--ld-window-r2 0

0
Entering edit mode

What is --ld-window 99999?

I run the commends with --ld-window 99999 and --ld-window 999. The result was smaller with later one. And output SNPs are totally different, not overlapped, why?