How to calculate enrichment P-value?
1
2
Entering edit mode
8.7 years ago
biostart ▴ 370

Hello,

Quick question: How to calculate the P value for the enrichment of my dataset in a certain feature?

I have calculated (using bedtools), that 5% of my dataset "A" intersects with a genomic feature of interest, and I calculated that for a random subset of genomic regions of the same size the intersection would be 11%. Thus, my dataset seems to have a strong depletion of this feature in comparison with genome-average. How do I calculate a P-value for this?

Thanks!

ChIP-Seq • 7.6k views
ADD COMMENT
2
Entering edit mode
8.7 years ago
Amitm ★ 2.3k

A Fisher's Exact test maybe? I am not sure if that might violate some statistical assumption but a 2x2 contingency table seems the straightforward way to go.

Col 1 -> 5, 95

Col 2 -> 11, 89

You get the picture. You can quickly do an online calc here http://www.quantitativeskills.com/sisa/statistics/fisher.htm

or you could use your favourite software/ R

ADD COMMENT
0
Entering edit mode

I guess, I should also multiply all the values which you listed by the absolute number of reads in the dataset (~10000)? And after I do this, the P value is very small, is it expected for these values?

ADD REPLY
0
Entering edit mode

please don't do this. Or if you do, could you do some sampling as well, to show how wildly out your Fisher test p-values are

ADD REPLY
0
Entering edit mode

hi,

I don't think using the number of reads in the dataset is good idea. Fisher's test's assumptions are that the observations are independent. Whereas the number of reads is a fixed space from where you are sampling overlapping or not-overlapping. So that would not be independent

ADD REPLY
1
Entering edit mode

Fisher's exact test works on counts, not on percentages. To compare percentages, you should use the two-proportion z-test.

ADD REPLY

Login before adding your answer.

Traffic: 1475 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6