Question: How to detect outliers from either (a) SNP-Fst or (b) Window-Fst distributions?
gravatar for serpalma.v
7 weeks ago by
serpalma.v20 wrote:


I want to find the SNPs that could be responsible for the phenotype differences observed between three populations. For that I computed Fst (weir and cockerham) using vcftools.

One population reflects the founder population (line0) from which the two populations were selected (line1 and line2), each one for a different trait. The phenotypes for each line are highly divergent.

Computing per-SNP Fst produces the following representative distributions.

Computing windowed (window = 500kb; slide = 250kb; min #SNPs=20) Fst produces the following representative distributions.

First, line1 vs line2 yields a different Fst distribution compared to (line1 | line2) vs line0.

Second, window Fst calculation (mean) yields smoother distributions.

I would like to seek advise on the following:

(1) how to define outliers considering the two types of observed Fst distributions?

(2) Is windowed Fst more suitable to identify outliers?

(3) How to define the size and step of a sliding window? (what I choose for this example is based on a similar study, but I guess it might require optimization)

(4) Do I need to do some type of SNP pruning (these SNPs are derived from WGS variant discovery analysis following GATK best practices)?

vcftools fst • 138 views
ADD COMMENTlink modified 7 weeks ago by h.mon25k • written 7 weeks ago by serpalma.v20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 808 users visited in the last hour