How to detect outliers from either (a) SNP-Fst or (b) Window-Fst distributions?
0
0
Entering edit mode
2.7 years ago
serpalma.v ▴ 70

Hello

I want to find the SNPs that could be responsible for the phenotype differences observed between three populations. For that I computed Fst (weir and cockerham) using vcftools.

One population reflects the founder population (line0) from which the two populations were selected (line1 and line2), each one for a different trait. The phenotypes for each line are highly divergent.

Computing per-SNP Fst produces the following representative distributions.

Computing windowed (window = 500kb; slide = 250kb; min #SNPs=20) Fst produces the following representative distributions.

First, line1 vs line2 yields a different Fst distribution compared to (line1 | line2) vs line0.

Second, window Fst calculation (mean) yields smoother distributions.

I would like to seek advise on the following:

(1) how to define outliers considering the two types of observed Fst distributions?

(2) Is windowed Fst more suitable to identify outliers?

(3) How to define the size and step of a sliding window? (what I choose for this example is based on a similar study, but I guess it might require optimization)

(4) Do I need to do some type of SNP pruning (these SNPs are derived from WGS variant discovery analysis following GATK best practices)?

Fst vcftools • 1.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 2386 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6