Question: Why my pvalue histogram doesn't have uniform distribution
0
gravatar for afli
19 months ago by
afli190
China, Beijing, IGDB
afli190 wrote:

Hi my friends, I do a fisher exact test by R, because I think the treatment would not affect the counts and I expect a uniform distribution of pvalue, but the histogram show U shape, with the 0 and 1 show large numbers. The code is as follows, could you please tell me why? Thank you very much!

test<-read.table("sample_fisher_test.txt")
test<-test[rowSums(test[,3:4])>5,]
for(i in 1:nrow(test))
{x<-c(test[i,1],test[i,3],test[i,2],test[i,4])
dim(x)<-c(2,2)
test$pvalue[i]<-fisher.test(x)$p.value}
ggplot(test, aes(x = pvalue)) +geom_histogram(binwidth = 0.05, fill = "lightblue", colour = "black")
dev.off()

enter image description here

data is available at: https://de.cyverse.org/dl/d/D577D93C-F511-41EE-AC74-26E2B5203564/sample_fisher_test.txt

ADD COMMENTlink modified 19 months ago by chrchang5236.6k • written 19 months ago by afli190
2

Why do you think it should be uniform?

ADD REPLYlink modified 19 months ago • written 19 months ago by ATpoint31k

I've just modified the content, I expect it to be, maybe it actually not. I just cannot understant the U shape.

ADD REPLYlink written 19 months ago by afli190
1

Your comment does not add any information. I personally have too little of a statistical background to formulate expectations about p-value distributions. You should ask yourself if your statistical knowledge is sufficient to do so. As this is a pure statistics question, you might consider to post it on StackExchange. If you do, you can enhance your chance of a good response by following the guildelines on How To Ask Good Questions On Technical And Scientific Forums, because right now, your question lacks any details on what the experimental setup was.

ADD REPLYlink written 19 months ago by ATpoint31k

Thank you ATpoint, I made the post in a hurry just now, sorry for that. I will read the guidelines carefully and do better next time. And I will post this on stackExchange to see if I can get some help.

Aifu.

ADD REPLYlink written 19 months ago by afli190
1

Hi- See if this blog post helps you http://varianceexplained.org/statistics/interpreting-pvalue-histogram/ . To get better answers, it would be good to give some background about what you are testing as the U-shape may or may not be anything to worry about.

ADD REPLYlink written 19 months ago by dariober11k

Thank you dariober, I've already seen this post, it is clear but the solution it provides could not solve my problem.

ADD REPLYlink written 19 months ago by afli190
1
gravatar for chrchang523
19 months ago by
chrchang5236.6k
United States
chrchang5236.6k wrote:

The large number of p=1 observations is due to p-values being "rounded up". For example, if count=200 and each row/column sums to 100, the central {50, 50} {50, 50} table has a ~11.2% chance of being observed under the null hypothesis. This table corresponds to p-value=1; the adjacent {49, 51} {51, 49} and {51, 49} {49, 51} tables correspond to p-value ~0.888, etc.

To avoid this upward bias, you can use the "mid-p value". In the example above, the most-central table has a mid-p value of ~0.944: the center, instead of the upper end, of the probability interval it corresponds to. The mid-p value has the nice property that, under the null hypothesis, the Q-Q plot should stay near the main diagonal.

(The same things are true for the binomial test you asked about earlier.)

Incidentally, I posted JavaScript Fisher's exact test and binomial test calculators up at https://www.cog-genomics.org/software/stats several years ago; the FET includes an option for turning the mid-p adjustment on/off, if you want to see more examples of the difference it makes.

ADD COMMENTlink modified 19 months ago • written 19 months ago by chrchang5236.6k

Thank you chrchang523, that sounds good, I do this using your 'fisher_test' function, with midp correction, the high value in 1 is reduced to a low value, and the 0.95 bar is largely increased, could it be possible to be similar between 1 and 0.95 values?(I filter the counts more strictly, so the general value is lower than the original picture)

enter image description here

ADD REPLYlink modified 19 months ago • written 19 months ago by afli190
1

This is expected if your sample sizes are such that p-value 1 usually corresponds to mid-p value in the 0.95 bin.

ADD REPLYlink written 19 months ago by chrchang5236.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1028 users visited in the last hour