Entering edit mode
2.3 years ago
elizabeth
•
0
I am trying to calculate per site Fst for two samples in a vcf file but am getting -nan for the output for the mean Fst estimate and for every site. This is what I ran:
vcftools --gzvcf ${VCF} --weir-fst-pop DBFCU --weir-fst-pop BBMCU --out ./cu_pops
The output from the run is:
VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--gzvcf /panfs/pfs.local/scratch/sjmac/e284e911/variantcalling/WildPops_combined.vcf.gz
--weir-fst-pop DBFCU
--weir-fst-pop BBMCU
--keep DBFCU
--keep BBMCU
--out ./cu_pops
Using zlib version: 1.2.11
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
Warning: Expected at least 2 parts in INFO entry: ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
Keeping individuals in 'keep' list
After filtering, kept 2 out of 9 Individuals
Outputting Weir and Cockerham Fst estimates.
Weir and Cockerham mean Fst estimate: -nan
Weir and Cockerham weighted Fst estimate: -nan
After filtering, kept 1886530 out of a possible 1886530 Sites
Run Time = 26.00 seconds
Thanks for any assistance.
It looks like vcftools droping your sample columns.
By looking at those warning line your vcf seems to have some issues.
It seems to me that dropping 7 of the 9 individuals is expected. I only wanted to calculate fst using 2 of the samples which are specified.
I'm not sure what is causing the warnings except that there is a formatting issue with a misplaced comma. I've seen some posts that suggest the warnings aren't important for downstream analyses, but I'm not sure.
How should the formatting issue be fixed?
Oh so filtering the samples are intended. Yes, sometimes you can ignore the warnings but its best not to have them. In your case I never seen those warning before and your problem might be caused by them since genotypes are used when calculating Fst. Try removing the lines with warnings and running again.
I tried removing the lines and no warnings were produced when I re-ran. However, Fst estimates are still -nan, so this doesn't seem to have solved the problem causing the calculation to fail.
Have you try to use 4 samples in each group ? it seems that fst needs at least 3 or 4 samples in each pop to get a right result