Question: Estimating FST per genome
0
2.8 years ago by
United Kingdom
GabrielMontenegro530 wrote:

Hi,

I am interested in computing an FST measure for the whole genome. I am implementing the FST Reynolds formula (1983). I found this paper on Genetics with a formula for a per site as well as a per region FST measure:

Where a stands for the between genetic differentiation and b for the within genetic differentiation. The formula is easy to apply to a region, you just sum these values for all the sites within your region.

My questions is, if you would like to estimate a per-genome estimate, is it OK to just use this second formula using all the sites in your genome?

Also, in several programs like PLINK you can get a weighted or unweighted estimate of FST. What is the difference between these two? I assume the weighted estimate would be similar to the second formula I am showing? whereas the unadjusted is just the mean of all sites?

fst next-gen genome • 1.4k views
modified 2.8 years ago by Zev.Kronenberg11k • written 2.8 years ago by GabrielMontenegro530
0
2.8 years ago by
United States
Zev.Kronenberg11k wrote:

For a genomic average I would just use Weir and Cockerham's FST (1984) for each site then build a distribution across the genome. You can also just take the average across the site FST values.

I've implemented this method in VCFLIB. If you're interested in learning more about FST I've tried to name all the variables to match the paper.

Thanks for the reply! I will check the method in VCFLIB. Since you personally have implemented that FST estimation, I was wondering what to do with sites that are fixed between two populations. For the FST of Reynolds I was getting undefined values, but I assume it would be sensible to treat those as zero? Would you agree?

1

You can only calculate FST for segregating sites.

``````    if(populationTarget->af == -1 || populationBackground->af == -1){
delete populationTarget;
delete populationBackground;
continue;
}
if(populationTarget->af == 1 &&  populationBackground->af == 1){
delete populationTarget;
delete populationBackground;
continue;
}
if(populationTarget->af == 0 &&  populationBackground->af == 0){
delete populationTarget;
delete populationBackground;
continue;
}
``````