Question

Comparing Allele Frequency Between 1000 Genomes And Nhlbi

0

Entering edit mode

10.5 years ago

User 1933 ▴ 340

I have a set of variants. These variants are also reported in 1000 Genome project (summary table) as well as (NHLBI - National Heart Lung Blood Institute) . I waned to see the frequency of these variants (allele frequencies) in these two project and see if they agree on each other. For such a comparison I used Mann-Whitney test.

here are my questions,

for making such a comparison I expect to get a not-significant p-value. Does Mann-Whitney distribution a right test ?!
is my expectation logical !?

Thank you,

comparison • 6.6k views

ADD COMMENT • link updated 6.9 years ago by Biostar 20 • written 10.5 years ago by User 1933 ▴ 340

0

Entering edit mode

I think it is not the right approach, but it is hard to put the finger on something because your post is very unclear. What is "genome 1000" what is "NHLBI". How do you generate your count table? Are you looking for a count difference for each allele? What is the question after all? "I want to see the frequency of these variants in genome 1000 and NHLBI" is not a valid question for a statistical test, because you can easily extract the allele frequencies (I guess that is what you mean with "frequency of these variants").

ADD REPLY • link 10.5 years ago by Michael 54k

0

Entering edit mode

thanks - I tried to update and make clear your points in the question.

ADD REPLY • link 10.5 years ago by User 1933 ▴ 340

score 1 · Answer 1 · 2013-11-04

I think that either Person's Chi-squared test for independence or Fisher's exact test will be appropriate. In the case of Chi-squared test the null-hypotheses is that the allele counts between 1k genomes and NHLBI are independent (say "different"), and the alternative hypothesis is that they are significantly dependent (say "come from a sufficiently similar distribution"). You will have to check if you can formulate your research question in terms of the null and alternative hypotheses, and try to format your data to fit the test (e.g. Fisher's test requires counts not frequencies).

Why Mann-Whitney U-test (Wilcon's Rank sum statistic) might not be appropriate: MWU-test is a test for the null that two populations are the same against the alternative that the populations are different, without making an assumption about the distribution. The only requirement is that the sampling is done from two populations, where I interpret "population" being generated by the same random process by repeated sampling of the same random variable, which is not the case for allele frequencies of different SNPs. (We cannot count repetitions per individual, because each individual sample contributes by 0 or 1 to the MAF ). Or in other words you would be comparing apples and oranges. I think that this is also a reasonable assumption for real allele frequencies of SNPs.

An example: imagine our test set consists of two SNPs of which we know the true MAF for the whole population, one with 0.1, the other with 0.4. If you put them together you might get a sampling vector of e.g. x=( 0.08, 0.45). However we know that this vector does not consist of values sample from the same random process, because we know a priori that these samples do not, because the process we sample from consists of one different "allele-generating process" with its own variance for each SNP.

It is a bit harder to argue why a test is not appropriate, so if someone has a more well-founded argument for or against that will be welcome.

score 1 · Answer 2 · 2013-11-04

1

Entering edit mode

10.5 years ago

Giovanni M Dall'Olio 28k

The first thing to do is to plot the two distributions (the site frequency spectrum) and compare them:

enter image description here

Then, a Mann-Whitney is a good option to compare the two distributions. However, if you have a large number of individuals, it is very likely that the Mann-Whitney, or any other test, will give you a significant p-value, even if the two means are close

ADD COMMENT • link 10.5 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

I have added the attempt to argue why MWU-test is not appropriate in this case, maybe we can discuss this? Appreciate the attempt to plot the distribution of MAF, will it look like the ones you are showing?

ADD REPLY • link 10.5 years ago by Michael 54k

0

Entering edit mode

That is the case - means are closed and I see the impact of number of samples on my p-value. Is there any treatment you can recommend ?

ADD REPLY • link 10.4 years ago by User 1933 ▴ 340