test if clustering differs
1
0
Entering edit mode
3.4 years ago

Hi,

I have a dataset of 400 genes with gene expression data (RNA-seq) across 45 tissues. These genes seem to cluster these tissues according to degree of proliferation, as indicated by the expression of a number of proliferation markers. I would now like to investigate whether this set of genes cluster the tissues according to proliferation more strongly than a random set of genes, what kind of test could I perform to elucidate that? When I cluster my data based on a random sample of 400 genes from the human genome, the tissues are not as clearly clustered according to proliferation so my hypothesis is that my set of genes are better at that.

Hope someone can help!

permutation test clustering r heatmap RNA-seq • 646 views
0
Entering edit mode
3.4 years ago

So, you have derived a gene signature of 400 genes that can seemingly segregate markers based on degree of proliferation. You now want to derive some metric that says how well these genes are related to proliferation.

There are many options, but here's just one idea:

1. Test each of your 400 genes independently in a regression model against your degree of proliferation and then choose only those genes that are statistically significant. This can be something like: lm(proliferation ~ gene1), etc.
2. With your final list of statistically significant genes, build a combined model that predicts degree of proliferation and put it to the test via cross validation and r-squared shrinkage, and a whole bunch of other tests that look at various aspects of the model. You can also do stepwise regression if your signature at this point is suitable small, i.e., in order to further refine it into a smaller signature.
3. Generate final tests statistics such as AUC (ROC analysis), sensitivity, and specificity on your final model.

To help with point 2, see here: A: Resources for gene signature creation

Note that if your degree of proliferation variable is categorical, you could do regularised regression: A: How to exclude some of breast cancer subtypes just by looking at gene expressio

Kevin