K-means for RNA seq gene clustering
1
3
Entering edit mode
2.7 years ago

Hi all,

I have used this tutorial (https://2-bitbio.com/2017/10/clustering-rnaseq-data-using-k-means.html) for unsupervised clustering of an RNAseq time course dataset. The input in this tutorial is the raw count table, so I'm asking myself how could I use the DEGs instead? I have 4 different timepoints and I use DESeq2 for DEG analysis. What would be prefered as input, the raw counts or the DEG?

Furthermore how could I make a dotplot of the genes and the clusters, similar to this dotplot in this thread? How to make k-means clustering plot for relative expression?

Thank you!

kmeans • 4.3k views
ADD COMMENT
3
Entering edit mode

Try filtering your dataset for DEGs, Then use z-score scaled, rlog-transformed counts as input for kmeans-clustering.

ADD REPLY
0
Entering edit mode

Thanks for your fast reply! After I run the DESeq, how will I filter the dataset? I'm a relatively newb, so any additional information/guidance would be highly appreciated. Thanks!

ADD REPLY
0
Entering edit mode
2.7 years ago

What would be prefered as input, the raw counts or the DEG?

The input to that tutorial is raw counts, which then undergo normalisation. All clustering algorithms that are then applied are based on the Z-transformed (by row/gene) CPM+0.25 values, as per these lines:

z <- cpm(y, normalized.lib.size=TRUE)

scaledata <- t(scale(t(z))) # Centers and scales data.

scaledata is then used for clustering

If you want to then use the DEGs, please just filter the scaledata object to only comprise the DEGs, and then re-do clustering. For example:

degs <- c('ATM','ERBB2','ERBB3','BRCC3')

scaledata.filt <- scaledata[degs,]

-----------------

---------

Furthermore how could I make a dotplot of the genes and the clusters, similar to this dotplot in this thread? How to make k-means clustering plot for relative expression?

It may help that you clarify specifically what you are visualising in your head. While those figures may look colourful and 'nice', what they say is important for most non-sensationalistic journals. Is it:

  • plot of a single gene's expression per cluster?
  • plot of a summarised 'score' per cluster?
  • plot of a summarised score per gene per cluster (k-means center or PAM medoid?

...what do you want to show?

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6