Confused by the `--ld-window` flag in Vcftools. What does the number of SNPs between SNPs mean?
0
0
Entering edit mode
20 days ago

If something reads wrong assume error on my part:

I am working with a code-base that is a somewhat niche wrapper around vcftools

There is a line of code as follows:

vcftools --vcf <vcf_subset_with_headers_generated_by_tabix> \
    --geno-r2-positions "<positions_file>" \
    --ld-window 500 \
    --out <output_name>

The vcftools manual mentions this

--ld-window <integer>

This optional parameter defines the maximum number of SNPs between the SNPs being tested for LD in the "--hap-r2", "--geno-r2", and "--geno-chisq" functions.

Out of curiosity I opened my output file produced by the above code/command and see that the first line of the output file is this

Chr01 78285061 Chr01 78240548 305 0.000272532

so loci one is 78285061 and loci two is 78240548

and if I then do:

and with tabix tabix <vcf_subset_with_headers_generated_by_tabix> Chr01:78240548-78285061 | wc -l the output is 516

What explains the discrepancy between 500 and 516 here?

I suspect tabix <vcf_subset_with_headers_generated_by_tabix> Chr01:78240548-78285061 | wc -l might not be the right way to count "number of SNPs between the SNPs being tested".

  1. Is line index not the right way to count the number of SNPs?
  2. Is the --ld-window flag a bit lax with the way it applies the limit?
  3. Is a different data field from vcftools used to calculate the number of SNPs between two positions?

I am fairly sure I am missing something here but don't quite know what. Any help is appreciated.

tabix vcftools • 194 views
ADD COMMENT

Login before adding your answer.

Traffic: 2622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6