Hello,

Quick question: How to calculate the P value for the enrichment of my dataset in a certain feature?

I have calculated (using bedtools), that 5% of my dataset "A" intersects with a genomic feature of interest, and I calculated that for a random subset of genomic regions of the same size the intersection would be 11%. Thus, my dataset seems to have a strong depletion of this feature in comparison with genome-average. How do I calculate a P-value for this?

Thanks!

I guess, I should also multiply all the values which you listed by the absolute number of reads in the dataset (~10000)? And after I do this, the P value is very small, is it expected for these values?

please don't do this. Or if you do, could you do some sampling as well, to show how wildly out your Fisher test p-values are

hi,

I don't think using the number of reads in the dataset is good idea. Fisher's test's assumptions are that the observations are independent. Whereas the number of reads is a fixed space from where you are sampling overlapping or not-overlapping. So that would not be independent

Fisher's exact test works on counts, not on percentages. To compare percentages, you should use the two-proportion z-test.