vcf files: counting number of variants in genomic windows of chosen size
Is there a tool to count the number of variants in each genomic window of user-designated size? Something that would work along the lines of vcftools --TajimaD which takes as argument the size of the window you would like and then calculates Tajima's D in each window. I would like to simply count the number of variants in each window.

Via BEDOPS:

$bedmap --echo --count genes.bed <(vcf2bed < variants.vcf) > answer.bed  If your genes are in another format, say GFF: $ bedmap --echo --count <(gff2bed < genes.gff) <(vcf2bed < variants.vcf) > answer.bed


If you have generic windows, replace genes.bed with a windows.bed of your design.

Hi, I am getting segmentation fault: 11 when using the first bedmap command as such:

bedmap --echo --count windows.bed <(vcf2bed < chr21.vcf.gz) > chr21.coverage.txt


The final output keeps giving me a count of 0 for each window. I'm not sure how to interpret this?

The file chr21.vcf.gz is not a VCF file, but is instead a gzip-compressed binary. Extract it and then pipe the extracted data to vcf2bed, e.g.:

\$ bedmap --echo --count --delim '\t' windows.bed <(gunzip -c chr21.vcf.gz | vcf2bed -) > windows_with_counts_of_variants.bed


Interpretation: If some of your windows are not on chr21, and all the variants in chr21.vcf are from chr21, then expect zero-counts over those windows which are not on that chromosome.