Question

Obtaining Gene Expression patterns via Kmeans - clustering sample means or linear models?

0

Entering edit mode

4.7 years ago

ponganta ▴ 590

Hi there,

I'm a novice in analysing gene expression data and have some difficulties going forward. Especially because I have no profound knowledge of modelling. Following the identfication of differentially expressed genes (DEGs) using DESeq2, I want to classify DEGs according to their expression patterns across tissues. The goal is to classify genes into groups of "interesting" candidates, based on the strength of expression in the tissue of interest. I then want to conduct GO-term enrichments for the interesting clusters.

Although I was able to extract some patterns, I'm unsure if my procedure has any merit.

Material:

A dataframe with rlog-transformed counts (transformed using DESeq2::rlog())

Procedure 1:

Per gene, compute tissue-means from biological triplicates
Per gene, scale the mean expression via base::scale()
cluster these vectors of scaled, mean expression levels using base::kmeans()
visualise

Here is the result: Differentially expressed genes, gene expression patterns. Kmeans clustering of zscore-scaled, mean per-tissue, rlog-transformed expression levels.

Procedure 2:

Per gene, build a linear model based on biological triplicates
Per gene, Zscore-scale the models
Cluster scaled models using kmeans
visualise

Questions:

Would you use any of these techniques, or are more sophisticated methods necessary?
If I should be using linear models, is it kosher to build a linear model of zscore-scaled linear models per cluster (just for the purpose of visualisation)

Thanks in advance for any advice,

Lukas

RNA-Seq expression patterns R modelling • 969 views

ADD COMMENT • link 4.7 years ago by ponganta ▴ 590