Hello,
I'm hoping for some logic checks on a simulations set I've been working on. I know the theory and equations for a two sided p value as derived from normal distributions and from Monte Carlo simulations. My issue more is my test statistic.
To construct my monte carlo matrix, I took ~13,000 genes and randomly selected 402 genes 10,000 times. Reason being that I have a list of 402 genes of interest that I am mapping the miRs to, and I am observing if they occur more in this set more than 402 random genes. So after I have the 10,000 simulations, I have columns of 402 genes each, those are then mapped to the miRs that target them, and I count how many times those miRs occurs across those 402 genes.
So my final product is a matrix where each column representing first the frequency of the miRs in the set of interest then in the simulations. Rows are the individual miRs.
i.e
mygene_freq sim1 sim2 etc...
mir1 5 3 4
mir2 2 2 1
mir3 1 3 3
My question is, since the number of miRs in each element is a calculated probability from the Monte Carlo, is this my test statistic? I want to compare whether the observed frequency of mir1 in mygene set is statistically significant from the 10,000 simulations for example.
Thank you!
I don't see where you're going with this. What is the question you're trying to address ? It looks to me that you're asking whether your 402 genes are enriched in miRNAs targets compared to a random set of genes. Why can't a Fisher's exact test (or equivalent) not answer that question ?