I would like to use in R (as there i have conducted my total analysis) to perform some kind of clustering (i.e. hierarchical clustering) with two groups of variables describing the same samples. One group is microarray gene expression data (for specific genes) that have been normalized and batch effect corrected. The other group also has some quantitative clinical parameters that describe the same samples. However, these clinical variables have not been normalized or subjected to any kind of transformation(i.e. raw continuous values).
For example, one variable of these could have range of values from ~0.002518 to ~27.3, whereas another from 1.69 to 1.82, or (even 0.03 to 0.87).
Thus, as my ultimate goal in to implement hierarchical clustering and use both groups simultaneously (merged in a matrix/dataframe), in order to inspect which of these clinical variables cluster with specific genes:
1) Would be row scaling (kind of z-score transformation[substract for each row variable row mean, divide by row average]) be enough to handle all my continuous variables when merged, and perform my clustering ? As an option included in many heatmap R packages and functions ?
2) Or z-score in the sense of standardizing above, requires normal distributions/normally distributed data ? and thus, i have to transform initially-separately my clinical variables-for example with log2 transformation-then merge, row scale and perform the clustering ? My other concern here, is that due to the above range of the clinical quantitative parameters, perhaps a lot of negative values could appear after log2 transform.
3) For a similar analysis/approach, like constructing a correlation plot of the above total variables, would a simple row scaling be sufficient ?
Any suggestions or ideas would be beneficial !!