Question: Methods for contrasting gene expression profiles between multiple groups
gravatar for orzech_mag
21 months ago by
orzech_mag230 wrote:

Dear Colleges,

I've got rna-seq expression data in two subtypes of cancer divided into two smaller groups each (finally I am having 4 groups to compare). I would like to compare all 4 groups at once to see gene profiles that are common and different between all these groups. I'd like to ask you what would be suggested method. My data is large, as it has 20k genes. I've already tried different variants of hierarchical clustering, but I get the whole picture of all 20k. There are visible patterns, but not clearly separated and I would need to filter the most differentiating genes manually. Is there any other option to contrast all these 4 groups at once and filter out genes that differentiate them well?

I'll appreciate your help and advices very much. Thank you in advance.

ADD COMMENTlink modified 21 months ago by Kevin Blighe70k • written 21 months ago by orzech_mag230
gravatar for Kevin Blighe
21 months ago by
Kevin Blighe70k
Republic of Ireland
Kevin Blighe70k wrote:


It seems like you need to apply ANOVA.

If you have raw counts, e.g., from RNA-seq, then process them in DESeq2 and follow the guidelines for Likelihood Ratio Test, which is akin in ANOVA.

If you have microarray data (already normalised and transformed), or any other type of expression data that has already been normalised and transformed, then use standard tests. In R:

Use Kruskal-Wallis non-parametric test if your sample n is low and/or your data distribution drifts from the 'bell curve'.

You can also do post-hoc non-parametric pairwise comparisons between your groups with Dunn's test, as I show here: A: Network/Pathway Analysis from Mass Spec data

One you identify statistically significant genes, filter your data matrix for these, and then re-generate your heatmap.


ADD COMMENTlink modified 21 months ago • written 21 months ago by Kevin Blighe70k

Please correct me if I am wrong, but I don't feel like ANOVA will solve my issue. Now I realized that I didn't specified all significant details of the data. So, I have big cohort study: 800 patients divided into 4 groups (2 cancer subtypes divided into 2 smaller groups), each patient had sequenced and processed (normalized) expression profiles of 20k genes. When I read the description of ANOVA you provided I couldn't find the way to A) analyze all 20k genes in all 800 patients divided by disease type factor at once, and B) base on the results filter genes that are common/distinct between all 4 groups and get them by name.

ADD REPLYlink written 21 months ago by orzech_mag230

Indeed, the methods that I proposed can be used to test each gene independently across your groups. Once each gene is tested, you would still have an understanding of genes that are different across your groups.

Alternatively, you can cluster all samples and genes together and then identify clusters in your data via various metrics, including

  • ConsensusClustering
  • Gap statistic
  • Elbow method
  • Silhouette Method
  • M3C

...or you can just 'cut' the dendrogram tree with cutree() function.

Another idea would be to perform lasso-penalised regression, which would allow you to analyse all genes together, and across all samples. RandomForest® is another idea.

Another idea is to building correlation networks.

It depends on what, exactly, you are hoping to achieve.

ADD REPLYlink written 21 months ago by Kevin Blighe70k

Thank you Kevin. It seems like there is wide range of options, now I need to define, which one will be appropriate for me.

ADD REPLYlink written 20 months ago by orzech_mag230

Thanks @Kevin for your nice explanation. I appreciate if you look at my question here

ADD REPLYlink written 5 months ago by Rahil180
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1140 users visited in the last hour