Question: How to detect outliers from either (a) SNP-Fst or (b) Window-Fst distributions?
gravatar for serpalma.v
16 months ago by
serpalma.v40 wrote:


I want to find the SNPs that could be responsible for the phenotype differences observed between three populations. For that I computed Fst (weir and cockerham) using vcftools.

One population reflects the founder population (line0) from which the two populations were selected (line1 and line2), each one for a different trait. The phenotypes for each line are highly divergent.

Computing per-SNP Fst produces the following representative distributions.

Computing windowed (window = 500kb; slide = 250kb; min #SNPs=20) Fst produces the following representative distributions.

First, line1 vs line2 yields a different Fst distribution compared to (line1 | line2) vs line0.

Second, window Fst calculation (mean) yields smoother distributions.

I would like to seek advise on the following:

(1) how to define outliers considering the two types of observed Fst distributions?

(2) Is windowed Fst more suitable to identify outliers?

(3) How to define the size and step of a sliding window? (what I choose for this example is based on a similar study, but I guess it might require optimization)

(4) Do I need to do some type of SNP pruning (these SNPs are derived from WGS variant discovery analysis following GATK best practices)?

vcftools fst • 779 views
ADD COMMENTlink modified 16 months ago by h.mon30k • written 16 months ago by serpalma.v40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1286 users visited in the last hour