Using vcftools to filter SNPs by a linkage disequilibrium (r2) threshold
1
3
Entering edit mode
4.5 years ago
shyamie ▴ 30

I'm using 1000 genomes vcfs, and I'm trying to thin out SNPs in moderate linkage disequlibrium (r2) using vcftools. In plink, I would do this using the --indep-pairwise parameter, and then excluding the outputted SNPs:

plink --bfile DATA --indep-pairwise 50 10 0.8 --out OUTPUT --noweb
plink --bfile DATA --exclude OUTPUT.prune.out --noweb --make-bed --out DATA_FILTERED


Does anyone know if there is an equivalent one or two step solution to do this using vcftools? I would like to avoid having to convert to plink format and back to vcf, if possible.

Thanks!

vcftools • 9.1k views
3
Entering edit mode
4.5 years ago

Yes, this is usually done using something like:

vcftools --vcf MyVariants.vcf --hap-r2 --ld-window-bp 10000 --out MyVariants.LD.10Kbp


By the way, if you have issues importing 1000 Genomes data into PLINK, then I cover that in my tutorial (including pruning based on LD): Produce PCA bi-plot for 1000 Genomes Phase III in VCF format

Kevin

1
Entering edit mode

Hi Kevin, Thanks for this response, it is helpful. However, from what I understand, this command will just output a file containing the r2, D, and D’ statistics. Is there a way to actually filter based on r2 after we have this file?

0
Entering edit mode

Hi everyone! I've got the same question and am wondering how you can actually prune for LD using VCFTools (not just identify the SNPs that are in LD). I'm wondering if you could use the command --hap-r2-positions <positions list="" file=""> to create a list of positions that are out of LD, and then use the --exclude-positions to prune out the SNPs that are in or out of LD. I'm going to give this a go, but if there are any other suggestions, that would be greatly appreciated!

1
Entering edit mode

VCFtools has long been superseded by BCFtools. Please use that. If you have other questions, you may open your own question.

1
Entering edit mode

Kia ora (thank you) Kevin! I just saw your other post here. It was very helpful!

VCFtools version for LD calculations specifying bin size

1
Entering edit mode

Kia ora bro / dudette!