Filtering SNPS by minimum LD value in bcftools?
0
3
Entering edit mode
2.6 years ago
RNAseqer ▴ 170

Hello everyone,

I have just started using bcftools to 'prune' some vcf files. While I have found some helpful examples of how to discard SNPs with high LD:

 bcftools +prune -l 0.6 -w 1000 frag.vcf -Ov -o output1.vcf


I was hoping to actually create output where the SNPs kept were those with r2 values higher than .6, and the other SNPs are discarded. Is there a straightforward way to do this?

bcftools vcftools vcf LD r2 • 2.9k views
0
Entering edit mode

If the functionality is not directly built into bcftools +prune, then I would, for example, compare the lists of SNPs in the filtered versus unfiltered, and then infer the ones that were removed. bcftools query can output VCF-formatted data in a neat way, and you could then use awk arrays to compare the lists.

0
Entering edit mode

I was thinking along the same lines. I think that would work. However, I did find vcftools has a command line option for minimum r2:

vcftools --vcf frag.vcf --hap-r2 --min-r2 .7 --ld-window-bp 50000 --out minr2_ld_window_50000


This outputs a file containing an r2 value rather than the vcf file data line... but I'm thinking it may be most efficient to just pull out these SNPs using a custom perl script that takes the vcftools output as its input and pulls lines from the original vcf file accordingly. Also, I am just starting to look at the Tagger program in the Broad's Haploview software package, since I am really interested in getting tagging SNPs alone...