Question: test if clustering differs
gravatar for frida.danielsson
11 months ago by
European Union
frida.danielsson40 wrote:


I have a dataset of 400 genes with gene expression data (RNA-seq) across 45 tissues. These genes seem to cluster these tissues according to degree of proliferation, as indicated by the expression of a number of proliferation markers. I would now like to investigate whether this set of genes cluster the tissues according to proliferation more strongly than a random set of genes, what kind of test could I perform to elucidate that? When I cluster my data based on a random sample of 400 genes from the human genome, the tissues are not as clearly clustered according to proliferation so my hypothesis is that my set of genes are better at that.

Hope someone can help!

ADD COMMENTlink modified 11 months ago by Kevin Blighe41k • written 11 months ago by frida.danielsson40
gravatar for Kevin Blighe
11 months ago by
Kevin Blighe41k
Guy's Hospital, London
Kevin Blighe41k wrote:

So, you have derived a gene signature of 400 genes that can seemingly segregate markers based on degree of proliferation. You now want to derive some metric that says how well these genes are related to proliferation.

There are many options, but here's just one idea:

  1. Test each of your 400 genes independently in a regression model against your degree of proliferation and then choose only those genes that are statistically significant. This can be something like: lm(proliferation ~ gene1), etc.
  2. With your final list of statistically significant genes, build a combined model that predicts degree of proliferation and put it to the test via cross validation and r-squared shrinkage, and a whole bunch of other tests that look at various aspects of the model. You can also do stepwise regression if your signature at this point is suitable small, i.e., in order to further refine it into a smaller signature.
  3. Generate final tests statistics such as AUC (ROC analysis), sensitivity, and specificity on your final model.

To help with point 2, see here: A: Resources for gene signature creation

Note that if your degree of proliferation variable is categorical, you could do regularised regression: A: How to exclude some of breast cancer subtypes just by looking at gene expressio


ADD COMMENTlink written 11 months ago by Kevin Blighe41k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1033 users visited in the last hour