I'm hoping for some logic checks on a simulations set I've been working on. I know the theory and equations for a two sided p value as derived from normal distributions and from Monte Carlo simulations. My issue more is my test statistic.
To construct my monte carlo matrix, I took ~13,000 genes and randomly selected 402 genes 10,000 times. Reason being that I have a list of 402 genes of interest that I am mapping the miRs to, and I am observing if they occur more in this set more than 402 random genes. So after I have the 10,000 simulations, I have columns of 402 genes each, those are then mapped to the miRs that target them, and I count how many times those miRs occurs across those 402 genes.
So my final product is a matrix where each column representing first the frequency of the miRs in the set of interest then in the simulations. Rows are the individual miRs.
mygene_freq sim1 sim2 etc... mir1 5 3 4 mir2 2 2 1 mir3 1 3 3
My question is, since the number of miRs in each element is a calculated probability from the Monte Carlo, is this my test statistic? I want to compare whether the observed frequency of mir1 in mygene set is statistically significant from the 10,000 simulations for example.