Entering edit mode
8.6 years ago
edianfranklin
•
0
I have a dataset from RNA-seq of genes expression in RPKM, one gene per row and four condition. I need clustering that data with kmeans and hierarchical.
My question is: I have to normalize the dataset with transformation to log(x+1) or can use it directly?
This page gives some pointers for clustering which I found useful: http://www.statmethods.net/advstats/cluster.html
thank you, what about the RPKM data?
RPKM already is a normalization. Should your clustering weigh heavier on highly expressed genes or should al genes be taken into account to the same extent? That's the question you have to ask for log normalization. Log normalization will squeeze all values closer together, limiting the effect of the strongest expressed genes...