Question: Negative Fst values in Lositan
gravatar for cecilia.villacorta
5.6 years ago by
cecilia.villacorta50 wrote:

Hi there,

I am using Lositan to detect outlier SNPs from a set of 556 SNPs. When I first upload my dataset, I get an overall Fst value of -0.025. After running the simulation (with default settings) 40% of all SNPs are listed as outliers. When I exclude the candidate outliers I get the same Fst value (-0.026).

Does anyone know why I could be getting negative Fst values? Also, is it normal to have almost half of the SNPs listed as outliers?


lositan snp fst forum • 5.6k views
ADD COMMENTlink modified 5.6 years ago by Zev.Kronenberg11k • written 5.6 years ago by cecilia.villacorta50
gravatar for Giovanni M Dall'Olio
5.6 years ago by
London, UK
Giovanni M Dall'Olio27k wrote:

This problem of negative Fst scores is not limited to Lositan, and it happens also with BioPerl, vcftools, and others.

In principle Fst scores are not impossible, as they mean that there is more variation within the population than between the two populations compared. In general, I believe it is common practice to change all the negative Fst scores to 0 and basically consider them as loci for which there is no population differentiation.

Regarding the problem of too many outliers, I am not certain of which demographic model is implemented in Lositan, and about which types of simulations are done. I would plot the site frequency spectra of both simulations and real data, and make sure they do not differ significantly (e.g. they have the same shape), specially for the SNPS at low frequency.


EDIT: I just discovered that, when you calculate Fst using vcftools between a population and itself, it returns some negative Fst scores:

$: vcftools --weir-fst-pop ACB.pop --weir-fst-pop ACB.pop --gzvcf (1000genomes phase3 data)

11      61395   -0.00465518
11      73015   nan
11      73048   nan
11      77250   nan
11      87150   nan
11      87203   nan
11      87209   -0.00512243
11      87268   -0.00574944
11      87293   nan
11      87341   -0.0052356
11      90692   nan
11      90697   nan
11      90964   nan
11      102905  -0.00794515
11      103253  -0.00704929
11      103365  -0.00517962
11      103367  -0.00517962
11      103368  -0.00517962
11      103604  nan

This basically tells that you can't trust negative Fst scores, and that you should consider them as software errors due to rounding or something else.

ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by Giovanni M Dall'Olio27k

This is useful. Have you published on this? I'm looking for citation of an example that has been through peer review.

ADD REPLYlink written 23 months ago by mcmullan00
gravatar for confusedious
5.6 years ago by
confusedious420 wrote:

Folks on Biostars in the past helped me with some similar questions.

Check this thread out:

Wright'S Fst And Weir & Cockerham'S Fst Estimator - Simple Explanation Of The Difference

ADD COMMENTlink written 5.6 years ago by confusedious420
gravatar for Zev.Kronenberg
5.6 years ago by
United States
Zev.Kronenberg11k wrote:

As Giovanni M Dall'Olio pointed out negative values are possible and common for Weir and Cockerham 1984 (equations A, B and C). 




To avoid excessive outliers, try removing very rare variants and sites where there are many missing genotypes. 

If you are looking for alternative tool, I've written a suite for association testing that has Fst.

and Smoothing



ADD COMMENTlink written 5.6 years ago by Zev.Kronenberg11k

A tool to calculate Fst taking into account the genotype likelihood directly from the VCF file. That's wonderful! :-)

ADD REPLYlink written 5.6 years ago by Giovanni M Dall'Olio27k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1195 users visited in the last hour