Question: Why removing SNPs with MAF<5% for Fst calculation?
0
gravatar for Mr Locuace
17 months ago by
Mr Locuace90
Chile
Mr Locuace90 wrote:

I have a very ignorant question. Let's say the SNP X has an allele A with a frequency of 0.52 and 0.002 in populations 1 and 2, respectively. In some papers I have read that people remove SNPs with MAF<5% in either of the populations when calculating Fst. These values suggest that A is very differentiated between pop1 and pop2. Indeed, I calculated Fst for SNP X and it has a value of ~0.9. But if I use the MAF>5% criterion, I would remove this strong signal of population differentiation. This does not make much sense for me. I would very much appreciate to have some feedback. Thanks !

snp fst maf • 1.0k views
ADD COMMENTlink modified 17 months ago by Kevin Blighe39k • written 17 months ago by Mr Locuace90
2
gravatar for Kevin Blighe
17 months ago by
Kevin Blighe39k
Republic of Ireland
Kevin Blighe39k wrote:

These guys, published in Genome Research, have addressed just this issue of allele frequency when calculating Fst: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759727/

Their results show just as you have implied, i.e., that the Fst is dependent on the allele frequency, but in addition they imply that the sample size is important. On that note, rare variants, being rare, will naturally be encountered less in populations and it is possible only now (recent years) that we have accumulated sequencing data on 1000s of individuals such that we can actually begin to analyse rare variants in various metrics, including Fst.

ADD COMMENTlink modified 17 months ago • written 17 months ago by Kevin Blighe39k
1

Thanks very much Kevin

ADD REPLYlink written 17 months ago by Mr Locuace90

¡De nada amigo!

ADD REPLYlink written 17 months ago by Kevin Blighe39k

Hi Kevin,

Given your response, the rare variants cannot be considered for the population differentiation as they are created in recent years, yes? however, the variants with the allele frequency < 5% are not rare, they are not just common. With removing variants with AF < 5%, we just assay the population differentiation in terms of common variants, while these variants cannot have the significant role in regards to the trait of interest and the various populations may differentiate at the low-frequency variants, not common variants. Could you please kindly correct me whenever I'm wrong and explain me a bit about removing the variants with AF <5% for Fst calculation, which does not still make sense for me?

ADD REPLYlink written 3 months ago by seta1.1k
1

In my answer, I just state that the authors noted a difference when calculating Fst for 'low frequency' variants (MAF <=0.05) versus 'most common' variants (<0.45 MAF <= 0.5). The title of this question is misleading because it implies that everybody should filter out MAF<=0.05 for calculating Fst.

Common variants can have a big role in disease. It is incorrect to assume that only rare variants contribute to complex disease phenotypes.

ADD REPLYlink written 3 months ago by Kevin Blighe39k

Thanks a lot for your explanation. So, in your opinion, is it better to calculate the Fst for lower frequency and common variants, separately rather than removing some variants?

Agree with you about the common variants and disease, thanks for correcting me.

In this paper, the authors mentioned that Fst analysis is not appropriate for detecting genetic risk differentiation among populations and Genetic Risk Variation (GRV) method developed by them can overcome the Fst problems in this situation and and showed its strength for detecting genetic risk differentiation in type 2 diabetes. However, I couldn’t find any script/too to run the GRV method. Could you please kindly share me your idea about it?

ADD REPLYlink modified 3 months ago • written 3 months ago by seta1.1k

So, in your opinion, is it better to calculate the Fst for lower frequency and common variants, separately rather than removing some variants?

I am not in the best position to advise on that. It would be a question more for a statistician, or at least a bioinformatician who has worked in this area for a number of years. I will say that literature frequently contradicts itself. Also the authors' work (GRV) likely will not work in other situations / diseases. You may find more information looking through CrossValidated / StackExchange

ADD REPLYlink written 3 months ago by Kevin Blighe39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1287 users visited in the last hour