Question: Using RPKM data of RNA-seq for Hierarchical clustering
2
gravatar for simonhb1990
3.8 years ago by
simonhb199020
United States
simonhb199020 wrote:

Hi,

 

 

I only have RPKM data from RNA-seq now, and want to make it for hierarchical clustering.

My question is whether I need to apply the log transformation for the RPKM data before the clustering? or I can directly calculate the zscore for the data to do clustering.

I have this question because I think the goal for log transformation is to scale the ratio of change, especially for microarray data. Since the data I have now is not a ratio, I think maybe no need to do this transformation. 

regards,

Simon

rna-seq • 2.3k views
ADD COMMENTlink modified 3.8 years ago by Asaf5.5k • written 3.8 years ago by simonhb199020
1
gravatar for matt.newman
3.8 years ago by
matt.newman130
United States
matt.newman130 wrote:

I still think you would need to log it before clustering it.  The purpose of taking the log of data is to reduce the effect of outliers.  That will apply here as well.  I know in our software (OncoLand) we do this when looking at heatmaps of RPKM data and automatically clustering.

ADD COMMENTlink written 3.8 years ago by matt.newman130
0
gravatar for seidel
3.8 years ago by
seidel6.8k
United States
seidel6.8k wrote:

The choice may depend on *why* you are clustering it. I find that mean centering the data and representing it as z-scores (also called scaling the data) as you mentioned, is a generally useful way to group genes with common behaviors.There's a function in R called: scale() that does this. Though beware as it scales columns by default so you have to wrap your call in t() to transpose it.

ADD COMMENTlink written 3.8 years ago by seidel6.8k
0
gravatar for Asaf
3.8 years ago by
Asaf5.5k
Israel
Asaf5.5k wrote:

How do you cluster? How do you measure distance? The log transformation might have no effect or it might be crucial depending on the distance function.

After you cluster, try to look for batch effect, I'm curious how the experiment might influence the data.

ADD COMMENTlink written 3.8 years ago by Asaf5.5k

I used hierarchical average linkage clustering using Euclidean distance by Cluster 3.0. 

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by simonhb199020
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1205 users visited in the last hour