Calculating Fst between different populations at a specific locus
2
0
Entering edit mode
6.8 years ago
Pinter • 0

Hello everyone,

I have the allele frequencies at many loci of 26 populations. I was wondering if anybody knows how to calculate the Fst for each variation. I am trying to find variations that are characteristics of a certain population/ethnicity (by selecting those variations with Fst < 0.05).

So for example, if I have 5 populations (A, B, C, D, E) and the allele frequencies at one locus (see below), how can I find the Fst for each variation at that position?

      A->G       A->T     A->AC
A      0         0.94     0.05
B     0.07       0.15     0.1  
C     0.8        0.1      0.04
D     0.1        0.05     0.2
E     0.03       0.1      0.15
------------------------------
Fst    ??         ??       ??

I hope my understanding of Fst is correct. If not, please correct me.

Thanks

statistics • 7.3k views
ADD COMMENT
1
Entering edit mode
6.8 years ago

Fst is a genetic distance which is usually calculated between pairs of populations. So in case if you have many populations you get a pairwise matrix of Fsts. You can calculate Fst matrix using Arlequin software - http://cmpg.unibe.ch/software/arlequin35/Arlequin35.html. On the other hand, I am not sure if Fst based on only one locus can give any biologically relevant results. Nevertheless, in Arlequin manual you can check locus-by-locus AMOVA, which might be similar to what you need. Additionally, for the whole dataset you also can do a correspondence analysis (using SPSS, for example, or I am sure there are packages in R) , which is similar to PCA, but also shows you the impact of each locus on the distribution of populations.

ADD COMMENT
0
Entering edit mode
6.8 years ago
Pinter • 0

Do you think a chi-squared test would be appropriate here, if I get the number of alleles instead of allele frequencies?

ADD COMMENT
0
Entering edit mode

I'm not sure what do you mean. What is you ultimate goal, considering my reply above? Just to clarify - you have mentioned ethnicity, so I guess this is human data. What type of data is it, mtDNA, Y, autosomal? P.S. if you need to clarify or ask something additionally, you should add a comment instead of posting a new answer with a question:)

ADD REPLY
0
Entering edit mode

I am trying to find loci with significant variations in the genomes. For example in the data I posted in the question, I would say there is significant variation at this locus because only population A has a high frequency for A->T variation and population C has a high frequency for A->G variation. I am trying to find an appropriate statistical test to somehow analyze my data (in the form given in the question above), and I thought Fst (or chi square test) would be good (maybe I am wrong). There are many R packages that calculate Fst but they take alignments as input (which I don't have). Maybe there is another statistical test I should use that you guys know of?

Thanks for you help and time :)

ADD REPLY

Login before adding your answer.

Traffic: 3149 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6