Question: Obtaining Gene Expression patterns via Kmeans - clustering sample means or linear models?
gravatar for ponganta
9 weeks ago by
ponganta50 wrote:

Hi there,

I'm a novice in analysing gene expression data and have some difficulties going forward. Especially because I have no profound knowledge of modelling. Following the identfication of differentially expressed genes (DEGs) using DESeq2, I want to classify DEGs according to their expression patterns across tissues. The goal is to classify genes into groups of "interesting" candidates, based on the strength of expression in the tissue of interest. I then want to conduct GO-term enrichments for the interesting clusters.

Although I was able to extract some patterns, I'm unsure if my procedure has any merit.


A dataframe with rlog-transformed counts (transformed using DESeq2::rlog())

Procedure 1:

  1. Per gene, compute tissue-means from biological triplicates
  2. Per gene, scale the mean expression via base::scale()
  3. cluster these vectors of scaled, mean expression levels using base::kmeans()
  4. visualise

Here is the result: Differentially expressed genes, gene expression patterns. Kmeans clustering of zscore-scaled, mean per-tissue, rlog-transformed expression levels.

Procedure 2:

  1. Per gene, build a linear model based on biological triplicates
  2. Per gene, Zscore-scale the models
  3. Cluster scaled models using kmeans
  4. visualise


  • Would you use any of these techniques, or are more sophisticated methods necessary?
  • If I should be using linear models, is it kosher to build a linear model of zscore-scaled linear models per cluster (just for the purpose of visualisation)

Thanks in advance for any advice,


ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by ponganta50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2010 users visited in the last hour