Question: Why my pvalue histogram doesn't have uniform distribution
0
afli190 wrote:

Hi my friends, I do a fisher exact test by R, because I think the treatment would not affect the counts and I expect a uniform distribution of pvalue, but the histogram show U shape, with the 0 and 1 show large numbers. The code is as follows, could you please tell me why? Thank you very much!

``````test<-read.table("sample_fisher_test.txt")
test<-test[rowSums(test[,3:4])>5,]
for(i in 1:nrow(test))
{x<-c(test[i,1],test[i,3],test[i,2],test[i,4])
dim(x)<-c(2,2)
test\$pvalue[i]<-fisher.test(x)\$p.value}
ggplot(test, aes(x = pvalue)) +geom_histogram(binwidth = 0.05, fill = "lightblue", colour = "black")
dev.off()
`````` modified 19 months ago by chrchang5236.6k • written 19 months ago by afli190
2

Why do you think it should be uniform?

I've just modified the content, I expect it to be, maybe it actually not. I just cannot understant the U shape.

1

Your comment does not add any information. I personally have too little of a statistical background to formulate expectations about p-value distributions. You should ask yourself if your statistical knowledge is sufficient to do so. As this is a pure statistics question, you might consider to post it on StackExchange. If you do, you can enhance your chance of a good response by following the guildelines on How To Ask Good Questions On Technical And Scientific Forums, because right now, your question lacks any details on what the experimental setup was.

Thank you ATpoint, I made the post in a hurry just now, sorry for that. I will read the guidelines carefully and do better next time. And I will post this on stackExchange to see if I can get some help.

Aifu.

1

Hi- See if this blog post helps you http://varianceexplained.org/statistics/interpreting-pvalue-histogram/ . To get better answers, it would be good to give some background about what you are testing as the U-shape may or may not be anything to worry about.

Thank you dariober, I've already seen this post, it is clear but the solution it provides could not solve my problem.

1
chrchang5236.6k wrote:

The large number of p=1 observations is due to p-values being "rounded up". For example, if count=200 and each row/column sums to 100, the central {50, 50} {50, 50} table has a ~11.2% chance of being observed under the null hypothesis. This table corresponds to p-value=1; the adjacent {49, 51} {51, 49} and {51, 49} {49, 51} tables correspond to p-value ~0.888, etc.

To avoid this upward bias, you can use the "mid-p value". In the example above, the most-central table has a mid-p value of ~0.944: the center, instead of the upper end, of the probability interval it corresponds to. The mid-p value has the nice property that, under the null hypothesis, the Q-Q plot should stay near the main diagonal.

(The same things are true for the binomial test you asked about earlier.)

Incidentally, I posted JavaScript Fisher's exact test and binomial test calculators up at https://www.cog-genomics.org/software/stats several years ago; the FET includes an option for turning the mid-p adjustment on/off, if you want to see more examples of the difference it makes.

Thank you chrchang523, that sounds good, I do this using your 'fisher_test' function, with midp correction, the high value in 1 is reduced to a low value, and the 0.95 bar is largely increased, could it be possible to be similar between 1 and 0.95 values?(I filter the counts more strictly, so the general value is lower than the original picture) 1

This is expected if your sample sizes are such that p-value 1 usually corresponds to mid-p value in the 0.95 bin.