20 months ago by
Seattle, WA USA
Maybe any two genes picked at random are likely to have zero correlation for your dataset — but who knows, really?
One way to know is to use your real data, generate a bunch of correlations from it, and see how things look in aggregate.
When you don't know whether observing a statistic is significant or not, one approach is to use bootstrap sampling.
One advantage of bootstrap sampling is that it is non-parametric. That is, you don't need to make as many assumptions about the underlying distribution of statistics in your population.
You could sample pairs of genes with replacement, calculate their Spearman rho correlations (or whatever statistic), and use that set of correlations to get summary statistics and build a confidence interval.
For instance, maybe you grab two genes at random 1000 times, calculating 1000 rhos. From those 1000 rhos, you can say something about the mean or median rho you'd expect to see over random combinations of any two genes, within some level of accuracy, i.e., confidence interval.
You could say that the correlation of any two random genes will fall within some confidence interval around the population mean correlation, about 95% of the time.
From that, if your two genes of interest have a correlation score outside that confidence interval, you might say the correlation of their signals is "significant" in that it less likely to be a "strong" correlation (or strong anti-correlation) by chance. This may or may not be biologically interesting, but that's a separate question.