Question

Compare allele frequencies between populations

0

Entering edit mode

7.0 years ago

sjohn ▴ 20

Hi, I want to compare the Minor allele frequencies for a list of SNPs from my population with that of 1000 genomes all individuals MAF values. My aim is to get an appropriate p-value statistic to get variants having significant allele frequencies. How do I get started? Many thanks in advance.

SNP allele frequency • 6.7k views

ADD COMMENT • link updated 7.0 years ago by Santosh Anand 5.7k • written 7.0 years ago by sjohn ▴ 20

1

Entering edit mode

Hi , To get a better statistic on your MAF you should get ExAC snp vcf http://exac.broadinstitute.org/downloads or gnomad http://gnomad.broadinstitute.org/downloads , if you are working on human genome. In this 2 databases 1000 genome are included.

Best

ADD REPLY • link 7.0 years ago by Titus ▴ 910

0

Entering edit mode

Thank you the suggestion. Any idea how I can do a p-value calculation between my population's allele frequencies and the global allele frequencies?

Thanks

ADD REPLY • link 7.0 years ago by sjohn ▴ 20

score 0 · Answer 1 · 2017-05-09

0

Entering edit mode

7.0 years ago

Santosh Anand 5.7k

I'm afraid that you don't know what the p-value is. The first thing about p-value is that it applies to populations, not to just plain numbers. If you have just two numbers (mean AF of your SNP vs. mean in 1kg database),then there is no discussion of having a p-value at all. Loosely speaking, the p-value says how much is the probability to draw samples in your population (of AF) if it came from other population (let's say of 1kg) - which loosely translates to how much your population (of AF of SNPs here) is different from other population (the 1kg AF, say) - the lower p-value means that the populations being compared are more certain to be different. You may find some useful posts on SE/SO regarding it, for example: https://stats.stackexchange.com/questions/166323/misunderstanding-a-p-value

Closing the discussion, if you need a p-value for your case, then you must have 1) either your whole population (=> the individual AFs in all samples, including 1kg) or 2) at least the mean and standard deviations of AFs.

BTW, what you need the p-value for?

ADD COMMENT • link 7.0 years ago by Santosh Anand 5.7k

0

Entering edit mode

Hi Santosh Thank you for your comment. If I were to rephrase my original question, it would be "What is the best statistic to determine if allele frequencies are significantly different between 2 populations?same question in RG here" I read about doing a Fischer's exact test or chi-squared test, but being a statistics newbie, I am not able to implement the same for allele frequencies. The exact scenario I have is this: I have got a list of about 80 SNPs from my populationexome data, annotated as being potentially deleterious. I am interested to know how these scale up when compared with the global populations( ie. if the disease allele frequency is significant when compared with other populations).

ADD REPLY • link 7.0 years ago by sjohn ▴ 20

0

Entering edit mode

I am interested to know how these scale up when compared with the global populations( ie. if the disease allele frequency is significant when compared with other populations)

Can you elaborate on this? Are you wondering if the number/proportion of SNPs (80) are significant when compared with whole genome data? Or that the AFs of all these 80 SNPs together are significantly different from the target 1kg data?

ADD REPLY • link 6.9 years ago by Santosh Anand 5.7k