Question: Compare allele frequencies between populations
2.4 years ago by
sjohn20
sjohn20 wrote:

Hi, I want to compare the Minor allele frequencies for a list of SNPs from my population with that of 1000 genomes all individuals MAF values. My aim is to get an appropriate p-value statistic to get variants having significant allele frequencies. How do I get started? Many thanks in advance.

written 2.4 years ago by sjohn20
Thank you the suggestion. Any idea how I can do a p-value calculation between my population's allele frequencies and the global allele frequencies?

Thanks

2.4 years ago by
Santosh Anand4.9k
Santosh Anand4.9k wrote:

I'm afraid that you don't know what the p-value is. The first thing about p-value is that it applies to populations, not to just plain numbers. If you have just two numbers (mean AF of your SNP vs. mean in 1kg database),then there is no discussion of having a p-value at all. Loosely speaking, the p-value says how much is the probability to draw samples in your population (of AF) if it came from other population (let's say of 1kg) - which loosely translates to how much your population (of AF of SNPs here) is different from other population (the 1kg AF, say) - the lower p-value means that the populations being compared are more certain to be different. You may find some useful posts on SE/SO regarding it, for example: https://stats.stackexchange.com/questions/166323/misunderstanding-a-p-value

Closing the discussion, if you need a p-value for your case, then you must have 1) either your whole population (=> the individual AFs in all samples, including 1kg) or 2) at least the mean and standard deviations of AFs.

BTW, what you need the p-value for?

Hi Santosh Thank you for your comment. If I were to rephrase my original question, it would be "What is the best statistic to determine if allele frequencies are significantly different between 2 populations?same question in RG here" I read about doing a Fischer's exact test or chi-squared test, but being a statistics newbie, I am not able to implement the same for allele frequencies. The exact scenario I have is this: I have got a list of about 80 SNPs from my populationexome data, annotated as being potentially deleterious. I am interested to know how these scale up when compared with the global populations( ie. if the disease allele frequency is significant when compared with other populations).

I am interested to know how these scale up when compared with the global populations( ie. if the disease allele frequency is significant when compared with other populations)

Can you elaborate on this? Are you wondering if the number/proportion of SNPs (80) are significant when compared with whole genome data? Or that the AFs of all these 80 SNPs together are significantly different from the target 1kg data?