Question: Using RPKM data of RNA-seq for Hierarchical clustering
gravatar for simonhb1990
5.0 years ago by
United States
simonhb199020 wrote:




I only have RPKM data from RNA-seq now, and want to make it for hierarchical clustering.

My question is whether I need to apply the log transformation for the RPKM data before the clustering? or I can directly calculate the zscore for the data to do clustering.

I have this question because I think the goal for log transformation is to scale the ratio of change, especially for microarray data. Since the data I have now is not a ratio, I think maybe no need to do this transformation. 



rna-seq • 2.8k views
ADD COMMENTlink modified 4 months ago by Biostar ♦♦ 20 • written 5.0 years ago by simonhb199020
gravatar for matt.newman
5.0 years ago by
United States
matt.newman150 wrote:

I still think you would need to log it before clustering it.  The purpose of taking the log of data is to reduce the effect of outliers.  That will apply here as well.  I know in our software (OncoLand) we do this when looking at heatmaps of RPKM data and automatically clustering.

ADD COMMENTlink written 5.0 years ago by matt.newman150
gravatar for seidel
5.0 years ago by
United States
seidel7.1k wrote:

The choice may depend on *why* you are clustering it. I find that mean centering the data and representing it as z-scores (also called scaling the data) as you mentioned, is a generally useful way to group genes with common behaviors.There's a function in R called: scale() that does this. Though beware as it scales columns by default so you have to wrap your call in t() to transpose it.

ADD COMMENTlink written 5.0 years ago by seidel7.1k
gravatar for Asaf
5.0 years ago by
Asaf8.1k wrote:

How do you cluster? How do you measure distance? The log transformation might have no effect or it might be crucial depending on the distance function.

After you cluster, try to look for batch effect, I'm curious how the experiment might influence the data.

ADD COMMENTlink written 5.0 years ago by Asaf8.1k

I used hierarchical average linkage clustering using Euclidean distance by Cluster 3.0. 

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by simonhb199020
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1554 users visited in the last hour