Different estimates of nucleotide diversity (pi) from two pipelines: pixy vs vcftools
0
0
Entering edit mode
4 months ago
nitinra ▴ 10

Hello all,

I am trying to calculate nucleotide diversity on 192 samples and have used vcftools and pixy to calculate it. However, the results from both pipelines are dissimilar. Is there a way to evaluate which one is the accurate estimate of nucleotide diversity?

Here is the pipeline I used:

vcftools --vcf input.vcf --max-missing 0.1 --minQ 30 --maf 0.1 --remove lowdepthindividuals --recode --recode-INFO-all --out output_filtered.vcf
bcftools +prune -l 0.2 -w 50kb output_filtered.vcf -Ov -o output_filtered_ldpruned.vcf

Pi calculations: VCFtools:

vcftools --vcf output_filtered_ldpruned.vcf --window-pi 10000 --out pi

Pixy:

pixy --stats pi --vcf output_filtered_ldpruned.vcf --zarr_path ./zarr \
--window_size 10000 --populations allpop.list --bypass_filtration yes \
    --bypass-invariant-sites yes --outfile_prefix results/combined

The results from VCFtools have pi estimates between 0 - 0.020 whereas the ones from pixy has estimates from 0.1 - 0.4. What could be causing the discrepancy between the two methods?

vcftools nucleotide diversity pixy • 289 views
ADD COMMENT

Login before adding your answer.

Traffic: 1035 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6