How Do You Test The Global Null Hypothesis In Expression Analysis?
3
2
Entering edit mode
11.7 years ago

What is the standard way to test whether two samples exhibit significantly different expression on a global basis?

Most rna-seq and microarray analysis packages seem to be concerned with identifying differentially expressed genes on a gene-by-gene basis, then correct for multiple testing.

Is there a single statistic, like an f-test, that is typically used for comparing all the genes of two samples at once? Or do they just aggregate the individual tests?

gene rna microarray • 4.5k views
0
Entering edit mode

What about a simple correlation ? Computing p-values with only two samples in your hands is quite complex. You just can NOT make a parametric test with only one observation from each population, even with 10000 genes expression values. You need multiple observations to estimate parameters (mean and variance if we consider the Gaussian case) of your population. I would personally go for a non parametric randomization test as David suggested below.

2
Entering edit mode
11.7 years ago

Typically we use a clustering approach to estimate how closely related expression profiles are on a global scale. Hierarchical and k-means clustering are both commonly used. The similarity of the clusters can then be calculated.

So yes, it is indeed performing multiple independent tests and aggregating the results, but that's really the nature of the data.

1
Entering edit mode
11.7 years ago

I don't know of an existing method for this; I suspect the answer would come from the general statistical literature. One approach that comes to mind is to measure distance between the two profiles (e.g. euclidian distance, or your preferred metric), call it D_obs, and then use permutation to put a non-parametric p-value on the distance. In each permutation, you shuffle one array and measure distance as D_perm. Then:

P = (number of times D_perm > D_obs) / N_perms.

0
Entering edit mode
11.7 years ago

I would say that the null hypothesis is that the samples come from the same population - if even one gene is differentially expressed the null hypothesis is rejected with the given p-value.

You just need to come up with the definition of 'global' (how many of the total) and check that there are indeed that many differentially expressed genes. You shouldn't need to aggregate the p-values (as the methods themselves should account for the multiple tests).

0
Entering edit mode

i can't believe with the 10000 microarray papers out there that there isn't already a standard way of computing the probability two samples are from the same population