I have a set of large genomic regions (~200, ~50-500kb) for which we hypothesize that there is higher RNA expression than expected by chance. The expression pertains to small RNAs, but let's make it broader and just think about RNA-seq coverage. An similar experiment would be if one had a set of ChIP-Seq peaks for a chromatin factor and wanted to test if genes overlapping those peaks have a higher changed of being (more) expressed.
My question is how best to test this hypothesis?
What I have done so far
I have estimated the coverage in those regions of interest, and as a control I used
shuffleBed to obtain 100 sets random regions (matched by size and chromosome), and also obtained the coverage for those regions. I could now average the coverage for each for the random sampling of each of these regions, and perform a statistical test (possibly non-parametric, but it depends on distribution of the sampling). Would be a good / ok / wrong way to go about it and are there better ways?