Question: Calculating Fst between different populations at a specific locus
0
Pinter0 wrote:

Hello everyone,

I have the allele frequencies at many loci of 26 populations. I was wondering if anybody knows how to calculate the Fst for each variation. I am trying to find variations that are characteristics of a certain population/ethnicity (by selecting those variations with Fst < 0.05).

So for example, if I have 5 populations (A, B, C, D, E) and the allele frequencies at one locus (see below), how can I find the Fst for each variation at that position?

``````      A->G       A->T     A->AC
A      0         0.94     0.05
B     0.07       0.15     0.1
C     0.8        0.1      0.04
D     0.1        0.05     0.2
E     0.03       0.1      0.15
------------------------------
Fst    ??         ??       ??
``````

I hope my understanding of Fst is correct. If not, please correct me.

Thanks

statistics bioinformatics • 3.9k views
modified 2.5 years ago • written 2.5 years ago by Pinter0
1
grant.hovhannisyan1.8k wrote:

Fst is a genetic distance which is usually calculated between pairs of populations. So in case if you have many populations you get a pairwise matrix of Fsts. You can calculate Fst matrix using Arlequin software - http://cmpg.unibe.ch/software/arlequin35/Arlequin35.html. On the other hand, I am not sure if Fst based on only one locus can give any biologically relevant results. Nevertheless, in Arlequin manual you can check locus-by-locus AMOVA, which might be similar to what you need. Additionally, for the whole dataset you also can do a correspondence analysis (using SPSS, for example, or I am sure there are packages in R) , which is similar to PCA, but also shows you the impact of each locus on the distribution of populations.

0
Pinter0 wrote:

Do you think a chi-squared test would be appropriate here, if I get the number of alleles instead of allele frequencies?

I'm not sure what do you mean. What is you ultimate goal, considering my reply above? Just to clarify - you have mentioned ethnicity, so I guess this is human data. What type of data is it, mtDNA, Y, autosomal? P.S. if you need to clarify or ask something additionally, you should add a comment instead of posting a new answer with a question:)

I am trying to find loci with significant variations in the genomes. For example in the data I posted in the question, I would say there is significant variation at this locus because only population A has a high frequency for A->T variation and population C has a high frequency for A->G variation. I am trying to find an appropriate statistical test to somehow analyze my data (in the form given in the question above), and I thought Fst (or chi square test) would be good (maybe I am wrong). There are many R packages that calculate Fst but they take alignments as input (which I don't have). Maybe there is another statistical test I should use that you guys know of?

Thanks for you help and time :)