vcftools --weir-fst-pop returns -nan
0
0
Entering edit mode
16 months ago
elizabeth • 0

I am trying to calculate per site Fst for two samples in a vcf file but am getting -nan for the output for the mean Fst estimate and for every site. This is what I ran:

vcftools --gzvcf ${VCF} --weir-fst-pop DBFCU --weir-fst-pop BBMCU --out ./cu_pops

The output from the run is:

    VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --gzvcf /panfs/pfs.local/scratch/sjmac/e284e911/variantcalling/WildPops_combined.vcf.gz
    --weir-fst-pop DBFCU
    --weir-fst-pop BBMCU
    --keep DBFCU
    --keep BBMCU
    --out ./cu_pops

Using zlib version: 1.2.11
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
Warning: Expected at least 2 parts in INFO entry: ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
Keeping individuals in 'keep' list
After filtering, kept 2 out of 9 Individuals
Outputting Weir and Cockerham Fst estimates.
Weir and Cockerham mean Fst estimate: -nan
Weir and Cockerham weighted Fst estimate: -nan
After filtering, kept 1886530 out of a possible 1886530 Sites
Run Time = 26.00 seconds

Thanks for any assistance.

fst vcftools • 1.4k views
ADD COMMENT
0
Entering edit mode

After filtering, kept 2 out of 9 Individuals

It looks like vcftools droping your sample columns.

By looking at those warning line your vcf seems to have some issues.

ADD REPLY
0
Entering edit mode

It seems to me that dropping 7 of the 9 individuals is expected. I only wanted to calculate fst using 2 of the samples which are specified.

I'm not sure what is causing the warnings except that there is a formatting issue with a misplaced comma. I've seen some posts that suggest the warnings aren't important for downstream analyses, but I'm not sure.

How should the formatting issue be fixed?

ADD REPLY
0
Entering edit mode

Oh so filtering the samples are intended. Yes, sometimes you can ignore the warnings but its best not to have them. In your case I never seen those warning before and your problem might be caused by them since genotypes are used when calculating Fst. Try removing the lines with warnings and running again.

ADD REPLY
0
Entering edit mode

I tried removing the lines and no warnings were produced when I re-ran. However, Fst estimates are still -nan, so this doesn't seem to have solved the problem causing the calculation to fail.

ADD REPLY
0
Entering edit mode

Have you try to use 4 samples in each group ? it seems that fst needs at least 3 or 4 samples in each pop to get a right result

ADD REPLY

Login before adding your answer.

Traffic: 2576 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6