Dear Biostars,

This might be one of the most obvious statistical related question in high-throughput sequencing data analysis. The question is, how one can calculate the enrichment of real versus random regions/peak overlaps?

For ex: The overlap between sox2 peaks and oct peaks is statically significant or not ?

```
My total no.of sox2 peaks = 4000
The no.of sox2 peaks that overlap oct4 = 2500
The no.of random sox2 peaks that overlap oct4 = 20
```

I agree that above example doesn't even need a statistical test to confirm the enrichment of 2500 over 20. But how one can statistically show this significance of enrichment as a p value per se ?

I was doing some thing like this. Do you think it is correct ? If not could you please suggest a better way ? Many thanx in advance!

```
= log (((The no.of sox2 peaks that overlap oct4 - The no.of random sox2 peaks that overlap oct4)/My total no.of sox2 peaks)*100)
= log ( ( (2500-20) / 4000) 100)
```

look at KS test : http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test