4.2 years ago by

London, UK

This problem of negative Fst scores is not limited to Lositan, and it happens also with BioPerl, vcftools, and others.

In principle Fst scores are not impossible, as they mean that there is more variation within the population than between the two populations compared. In general, I believe it is common practice to change all the negative Fst scores to 0 and basically consider them as loci for which there is no population differentiation.

Regarding the problem of too many outliers, I am not certain of which demographic model is implemented in Lositan, and about which types of simulations are done. I would plot the site frequency spectra of both simulations and real data, and make sure they do not differ significantly (e.g. they have the same shape), specially for the SNPS at low frequency.

EDIT: I just discovered that, when you calculate Fst using vcftools between a population and itself, it returns some negative Fst scores:

$: vcftools --weir-fst-pop ACB.pop --weir-fst-pop ACB.pop --gzvcf (1000genomes phase3 data)
CHROM POS WEIR_AND_COCKERHAM_FST
11 61395 -0.00465518
11 73015 nan
11 73048 nan
11 77250 nan
11 87150 nan
11 87203 nan
11 87209 -0.00512243
11 87268 -0.00574944
11 87293 nan
11 87341 -0.0052356
11 90692 nan
11 90697 nan
11 90964 nan
11 102905 -0.00794515
11 103253 -0.00704929
11 103365 -0.00517962
11 103367 -0.00517962
11 103368 -0.00517962
11 103604 nan

This basically tells that you can't trust negative Fst scores, and that you should consider them as software errors due to rounding or something else.