Entering edit mode
                    3.9 years ago
        reza
        
    
        ▴
    
    300
    I have a multi-sample VCF file (20 individuals) and I want to calculate Pi (nucleotide diversity) in each population for detection of the signature of selection. I do this with following commands:
vcftools --gzvcf Whole.vcf --keep pop1_list --window-pi 40000 --window-pi-step 20000 --out pop1.pi
vcftools --gzvcf Whole.vcf --keep pop2_list --window-pi 40000 --window-pi-step 20000 --out pop2.pi
these commands outputted two files with different windows numbers (86415 windows vs 86430) and different SNP numbers in the same windows, for example:
pop1
CHROM   BIN_START   BIN_END N_VARIANTS  PI
NC_044511.1 1   40000   49  0.000265416
NC_044511.1 20001   60000   24  0.000146456
NC_044511.1 40001   80000   38  0.000386449
NC_044511.1 60001   100000  68  0.000650799
NC_044511.1 80001   120000  96  0.000888518
pop2
CHROM   BIN_START   BIN_END N_VARIANTS  PI
NC_044511.1 1   40000   39  0.00030515
NC_044511.1 20001   60000   7   2.97E-05
NC_044511.1 40001   80000   39  0.000375541
NC_044511.1 60001   100000  78  0.000694135
NC_044511.1 80001   120000  102 0.000900462
while I run the following command I get 60 SNPs
bcftools stats -r NC_044511.1:1-40000
Why there is no correspondence between the results?