Question: How to calculate enrichment P-value?
2
gravatar for biostart
5.0 years ago by
biostart350
Germany
biostart350 wrote:

Hello,

Quick question: How to calculate the P value for the enrichment of my dataset in a certain feature?

I have calculated (using bedtools), that 5% of my dataset "A" intersects with a genomic feature of interest, and I calculated that for a random subset of genomic regions of the same size the intersection would be 11%. Thus, my dataset seems to have a strong depletion of this feature in comparison with genome-average. How do I calculate a P-value for this?

Thanks!

chip-seq • 5.1k views
ADD COMMENTlink modified 5.0 years ago by Amitm2.1k • written 5.0 years ago by biostart350
1
gravatar for Amitm
5.0 years ago by
Amitm2.1k
UK
Amitm2.1k wrote:

A Fisher's Exact test maybe? I am not sure if that might violate some statistical assumption but a 2x2 contingency table seems the straighforward way to go.

Col 1 -> 5, 95

Col 2 -> 11, 89

You get the picture. You can quickly do an online calc here http://www.quantitativeskills.com/sisa/statistics/fisher.htm

or you could use your favourite software/ R

ADD COMMENTlink written 5.0 years ago by Amitm2.1k

I guess, I should also multiply all the values which you listed by the absolute number of reads in the dataset (~10000)? And after I do this, the P value is very small, is it expected for these values?

ADD REPLYlink written 5.0 years ago by biostart350

please don't do this. Or if you do, could you do some sampling as well, to show how wildly out your Fisher test p-values are

ADD REPLYlink written 5.0 years ago by russhh5.5k

hi,

I don't think using the number of reads in the dataset is good idea. Fisher's test's assumptions are that the observations are independent. Whereas the number of reads is a fixed space from where you are sampling overlapping or not-overlapping. So that would not be independent

ADD REPLYlink written 5.0 years ago by Amitm2.1k
1

Fisher's exact test works on counts, not on percentages. To compare percentages, you should use the two-proportion z-test.

ADD REPLYlink written 5.0 years ago by Jean-Karim Heriche24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1370 users visited in the last hour
_