Question about VCFtools --window-pi --window-pi-step
1
0
Entering edit mode
4.3 years ago
60343011s ▴ 50

Hi all

I'm using VCFtools (v0.1.17) for estimating nucleotide diversity of my study species.

I already got a VCF file which was made form mapping to a draft genome, then I used it to calculate pi value.

As you can see, the output showed the bin size and variants(here, I used --window-pi 60000 --window-pi-step 24000), pi value = numbers of variants/Bin size

CHROM   BIN_START   BIN_END N_VARIANTS  PI    
scaffold22988   1   60000   11  0.000183333

The problem is that scaffold22988 has only 1015 bp, but it used total bin size for estimating pi, instead of the length of that scaffold. This makes the average pi value across genome under estimated when large bin size was applied.

This situation also happened on the end of large scaffold, like:

CHROM   BIN_START   BIN_END N_VARIANTS  PI    
scaffold14  18960001    19020000    14  0.000233333

Scaffold14 in fact has only 18,967,204 bp. So again, the pi value of the last window of this scaffold was underestimated (The bin size should be 18967204-18960001+1=7204 here).

I want to ask is there any methods that can specify the program not to over estimate bin size? I've been read on the manual of VCFtools, but did not see any similar function.

Will be grateful for any suggestions.

Assembly genome alignment • 3.0k views
ADD COMMENT
0
Entering edit mode
2.5 years ago
VeeKoo • 0

Hi,

I would also like a suggestion to this. Or does this matter in my case when I'm trying to compare two different populations' nucleotide diversity from the same joint-called VCF? So then the pi value is affected the same way in both populations.

ADD COMMENT

Login before adding your answer.

Traffic: 3097 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6