I am confused about an approach to evaluate enrichment and was wondering if you could help me understand if what I am doing makes sense.
I have counts from regions that overlap histone markers in a dataset and I would like to know if histone markers are enriched in my dataset compared to a random set using a randomization approach.
I have created 1000 similar datasets and created counts for number of regions overlapping histones in these null datasets. In some cases, this distribution is normal, but in others it is not.
In cases where the distribution of the proportions from the null datasets is normal:
I can use these data to find the mean, standard deviation, and degrees of freedom and compare this distribution (mean, sd and df) to my observed count. Is this correct?
# "sim.null" is a normal distribution of counts of overlaps that I get from 1000 simulations (this is matched on my original dataset for some features)
sim.null= rnorm(sd=0.001, mean=0.01, n=1000)
# I would like to compare it with the counts I get from my dataset
observed = 0.0125
t = mean(sim.null)-observed / (sd(sim.null)/sqrt(1000))
# Is this the same as doing this? t.test(sim.null, mu=observed, alternative="two.sided")$p.value
In cases where the null dataset is not normal: would maybe a Fisher exact test be appropriate?
Thank you very much, any suggestions are very appreciated!