Using RPKM data of RNA-seq for Hierarchical clustering
2
2
Entering edit mode
8.8 years ago
simonhb1990 ▴ 20

Hi,

I only have RPKM data from RNA-seq now, and want to make it for hierarchical clustering.

My question is whether I need to apply the log transformation for the RPKM data before the clustering? or I can directly calculate the zscore for the data to do clustering.

I have this question because I think the goal for log transformation is to scale the ratio of change, especially for microarray data. Since the data I have now is not a ratio, I think maybe no need to do this transformation.

regards,
Simon

RNA-Seq • 4.1k views
ADD COMMENT
0
Entering edit mode

How do you cluster? How do you measure distance? The log transformation might have no effect or it might be crucial depending on the distance function.

After you cluster, try to look for batch effect, I'm curious how the experiment might influence the data.

ADD REPLY
0
Entering edit mode

I used hierarchical average linkage clustering using Euclidean distance by Cluster 3.0.

ADD REPLY
1
Entering edit mode
8.8 years ago
matt.newman ▴ 170

I still think you would need to log it before clustering it. The purpose of taking the log of data is to reduce the effect of outliers. That will apply here as well. I know in our software (OncoLand) we do this when looking at heatmaps of RPKM data and automatically clustering.

ADD COMMENT
0
Entering edit mode
8.8 years ago
seidel 11k

The choice may depend on why you are clustering it. I find that mean centering the data and representing it as z-scores (also called scaling the data) as you mentioned, is a generally useful way to group genes with common behaviors. There's a function in R called: scale() that does this. Though beware as it scales columns by default so you have to wrap your call in t() to transpose it.

ADD COMMENT

Login before adding your answer.

Traffic: 1985 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6