Question: Filtering SNPS by minimum LD value in bcftools?
gravatar for RNAseqer
3 months ago by
RNAseqer 110
RNAseqer 110 wrote:

Hello everyone,

I have just started using bcftools to 'prune' some vcf files. While I have found some helpful examples of how to discard SNPs with high LD:

 bcftools +prune -l 0.6 -w 1000 frag.vcf -Ov -o output1.vcf

I was hoping to actually create output where the SNPs kept were those with r2 values higher than .6, and the other SNPs are discarded. Is there a straightforward way to do this?

r2 ld vcftools bcftools vcf • 286 views
ADD COMMENTlink written 3 months ago by RNAseqer 110

If the functionality is not directly built into bcftools +prune, then I would, for example, compare the lists of SNPs in the filtered versus unfiltered, and then infer the ones that were removed. bcftools query can output VCF-formatted data in a neat way, and you could then use awk arrays to compare the lists.

ADD REPLYlink written 3 months ago by Kevin Blighe44k

I was thinking along the same lines. I think that would work. However, I did find vcftools has a command line option for minimum r2:

vcftools --vcf frag.vcf --hap-r2 --min-r2 .7 --ld-window-bp 50000 --out minr2_ld_window_50000

This outputs a file containing an r2 value rather than the vcf file data line... but I'm thinking it may be most efficient to just pull out these SNPs using a custom perl script that takes the vcftools output as its input and pulls lines from the original vcf file accordingly. Also, I am just starting to look at the Tagger program in the Broad's Haploview software package, since I am really interested in getting tagging SNPs alone...

ADD REPLYlink modified 3 months ago • written 3 months ago by RNAseqer 110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 833 users visited in the last hour