How to detect outliers from either (a) SNP-Fst or (b) Window-Fst distributions?
Entering edit mode
3.5 years ago
serpalma.v ▴ 70


I want to find the SNPs that could be responsible for the phenotype differences observed between three populations. For that I computed Fst (weir and cockerham) using vcftools.

One population reflects the founder population (line0) from which the two populations were selected (line1 and line2), each one for a different trait. The phenotypes for each line are highly divergent.

Computing per-SNP Fst produces the following representative distributions.

Computing windowed (window = 500kb; slide = 250kb; min #SNPs=20) Fst produces the following representative distributions.

First, line1 vs line2 yields a different Fst distribution compared to (line1 | line2) vs line0.

Second, window Fst calculation (mean) yields smoother distributions.

I would like to seek advise on the following:

(1) how to define outliers considering the two types of observed Fst distributions?

(2) Is windowed Fst more suitable to identify outliers?

(3) How to define the size and step of a sliding window? (what I choose for this example is based on a similar study, but I guess it might require optimization)

(4) Do I need to do some type of SNP pruning (these SNPs are derived from WGS variant discovery analysis following GATK best practices)?

Fst vcftools • 1.6k views

Login before adding your answer.

Traffic: 2369 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6