Obtaining Gene Expression patterns via Kmeans - clustering sample means or linear models?
0
0
Entering edit mode
3.4 years ago
ponganta ▴ 590

Hi there,

I'm a novice in analysing gene expression data and have some difficulties going forward. Especially because I have no profound knowledge of modelling. Following the identfication of differentially expressed genes (DEGs) using DESeq2, I want to classify DEGs according to their expression patterns across tissues. The goal is to classify genes into groups of "interesting" candidates, based on the strength of expression in the tissue of interest. I then want to conduct GO-term enrichments for the interesting clusters.

Although I was able to extract some patterns, I'm unsure if my procedure has any merit.

Material:

A dataframe with rlog-transformed counts (transformed using DESeq2::rlog())

Procedure 1:

  1. Per gene, compute tissue-means from biological triplicates
  2. Per gene, scale the mean expression via base::scale()
  3. cluster these vectors of scaled, mean expression levels using base::kmeans()
  4. visualise

Here is the result: Differentially expressed genes, gene expression patterns. Kmeans clustering of zscore-scaled, mean per-tissue, rlog-transformed expression levels.

Procedure 2:

  1. Per gene, build a linear model based on biological triplicates
  2. Per gene, Zscore-scale the models
  3. Cluster scaled models using kmeans
  4. visualise

Questions:

  • Would you use any of these techniques, or are more sophisticated methods necessary?
  • If I should be using linear models, is it kosher to build a linear model of zscore-scaled linear models per cluster (just for the purpose of visualisation)

Thanks in advance for any advice,

Lukas

RNA-Seq expression patterns R modelling • 764 views
ADD COMMENT

Login before adding your answer.

Traffic: 2943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6