Question: How to detect outliers from either (a) SNP-Fst or (b) Window-Fst distributions?
0
gravatar for serpalma.v
7 weeks ago by
serpalma.v20
Germany
serpalma.v20 wrote:

Hello

I want to find the SNPs that could be responsible for the phenotype differences observed between three populations. For that I computed Fst (weir and cockerham) using vcftools.

One population reflects the founder population (line0) from which the two populations were selected (line1 and line2), each one for a different trait. The phenotypes for each line are highly divergent.

Computing per-SNP Fst produces the following representative distributions.

Computing windowed (window = 500kb; slide = 250kb; min #SNPs=20) Fst produces the following representative distributions.

First, line1 vs line2 yields a different Fst distribution compared to (line1 | line2) vs line0.

Second, window Fst calculation (mean) yields smoother distributions.

I would like to seek advise on the following:

(1) how to define outliers considering the two types of observed Fst distributions?

(2) Is windowed Fst more suitable to identify outliers?

(3) How to define the size and step of a sliding window? (what I choose for this example is based on a similar study, but I guess it might require optimization)

(4) Do I need to do some type of SNP pruning (these SNPs are derived from WGS variant discovery analysis following GATK best practices)?

vcftools fst • 138 views
ADD COMMENTlink modified 7 weeks ago by h.mon25k • written 7 weeks ago by serpalma.v20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 808 users visited in the last hour