Question: Calculation of 2 sided p value from Monte Carlo simulation with frequency parameters?
0
6 months ago by
hs.lansdell0 wrote:

Hello,

I'm hoping for some logic checks on a simulations set I've been working on. I know the theory and equations for a two sided p value as derived from normal distributions and from Monte Carlo simulations. My issue more is my test statistic.

To construct my monte carlo matrix, I took ~13,000 genes and randomly selected 402 genes 10,000 times. Reason being that I have a list of 402 genes of interest that I am mapping the miRs to, and I am observing if they occur more in this set more than 402 random genes. So after I have the 10,000 simulations, I have columns of 402 genes each, those are then mapped to the miRs that target them, and I count how many times those miRs occurs across those 402 genes.

So my final product is a matrix where each column representing first the frequency of the miRs in the set of interest then in the simulations. Rows are the individual miRs.

i.e

``````         mygene_freq   sim1      sim2   etc...
mir1        5           3         4
mir2        2           2         1
mir3        1           3         3
``````

My question is, since the number of miRs in each element is a calculated probability from the Monte Carlo, is this my test statistic? I want to compare whether the observed frequency of mir1 in mygene set is statistically significant from the 10,000 simulations for example.

Thank you!

modified 4 months ago by Biostar ♦♦ 20 • written 6 months ago by hs.lansdell0

I don't see where you're going with this. What is the question you're trying to address ? It looks to me that you're asking whether your 402 genes are enriched in miRNAs targets compared to a random set of genes. Why can't a Fisher's exact test (or equivalent) not answer that question ?