Question: Calculating Fst between different populations at a specific locus
0
gravatar for Pinter
21 months ago by
Pinter0
Pinter0 wrote:

Hello everyone,

I have the allele frequencies at many loci of 26 populations. I was wondering if anybody knows how to calculate the Fst for each variation. I am trying to find variations that are characteristics of a certain population/ethnicity (by selecting those variations with Fst < 0.05).

So for example, if I have 5 populations (A, B, C, D, E) and the allele frequencies at one locus (see below), how can I find the Fst for each variation at that position?

      A->G       A->T     A->AC
A      0         0.94     0.05
B     0.07       0.15     0.1  
C     0.8        0.1      0.04
D     0.1        0.05     0.2
E     0.03       0.1      0.15
------------------------------
Fst    ??         ??       ??

I hope my understanding of Fst is correct. If not, please correct me.

Thanks

statistics bioinformatics • 2.7k views
ADD COMMENTlink modified 21 months ago • written 21 months ago by Pinter0
1
gravatar for grant.hovhannisyan
21 months ago by
grant.hovhannisyan1.5k wrote:

Fst is a genetic distance which is usually calculated between pairs of populations. So in case if you have many populations you get a pairwise matrix of Fsts. You can calculate Fst matrix using Arlequin software - http://cmpg.unibe.ch/software/arlequin35/Arlequin35.html. On the other hand, I am not sure if Fst based on only one locus can give any biologically relevant results. Nevertheless, in Arlequin manual you can check locus-by-locus AMOVA, which might be similar to what you need. Additionally, for the whole dataset you also can do a correspondence analysis (using SPSS, for example, or I am sure there are packages in R) , which is similar to PCA, but also shows you the impact of each locus on the distribution of populations.

ADD COMMENTlink written 21 months ago by grant.hovhannisyan1.5k
0
gravatar for Pinter
21 months ago by
Pinter0
Pinter0 wrote:

Do you think a chi-squared test would be appropriate here, if I get the number of alleles instead of allele frequencies?

ADD COMMENTlink written 21 months ago by Pinter0

I'm not sure what do you mean. What is you ultimate goal, considering my reply above? Just to clarify - you have mentioned ethnicity, so I guess this is human data. What type of data is it, mtDNA, Y, autosomal? P.S. if you need to clarify or ask something additionally, you should add a comment instead of posting a new answer with a question:)

ADD REPLYlink written 21 months ago by grant.hovhannisyan1.5k

I am trying to find loci with significant variations in the genomes. For example in the data I posted in the question, I would say there is significant variation at this locus because only population A has a high frequency for A->T variation and population C has a high frequency for A->G variation. I am trying to find an appropriate statistical test to somehow analyze my data (in the form given in the question above), and I thought Fst (or chi square test) would be good (maybe I am wrong). There are many R packages that calculate Fst but they take alignments as input (which I don't have). Maybe there is another statistical test I should use that you guys know of?

Thanks for you help and time :)

ADD REPLYlink written 21 months ago by Pinter0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1879 users visited in the last hour