Hi I was hoping for some more feedback on this test I am trying to perform to check for enrichment at specific sites throughout the genome and my statistics background isn't great.

Here is the current setup:

Given:
- genomic coordinates of iCLIP binding sites (single nucleotide position corresponding to the site of a crosslink) from two different proteins. **Sample A** and **Sample B**

Goal:
- Researcher wants to put a p-value on whether there is a greater number of **Sample A** positions nearby to **Sample B** positions than you would expect to observe by chance. 60-nt bins were chosen for a biological reason related to the protein from Sample B.

Setting up the test:

Step 1: split genome in 60 nt bins (do both strands separately) and count the total number of bins --> total number of **balls** in the **urn**

Step 2: count the number of bins overlapping with one or more Sample A positions --> total number of **white balls** in the **urn**

Step 3: count the number of bins overlapping with one or more Sample B positions --> total number of **balls** **drawn** without replacement from the urn

Step 4: count the number of bins overlapping with one or more Sample A and Sample B position --> total number of **white balls** **drawn** without replacement from the urn

Does this seem like an acceptable test to do? Or is there a better test for this kind of scenario.

If anyone is interested this is how the p-value is generated in R for the test: 1-phyper(q=step 4, m= step 2, n= step 1 - step 2, k = step 3 )

Also have a look at the genometricorr R package.

