Question

hierarchical cluster of drug treatment after correlation analysis

1

Entering edit mode

6.1 years ago

limin201709 ▴ 10

Hi,

I am doing a correlation between one drug treatment and several other drug treatments, So I got a dataframe would be like the colname are the drug name and rowname are gene name, values are correlation coefficient, When I performed the hierarchical clustering on the data frame, because I want to see if there are any common gene have the same expression or if there are specific genes in one drug treatment, which methods should I use, euclidean or other, and average linkage or complete?

Thanks alot Min

R cluster correlation • 1.1k views

ADD COMMENT • link updated 6.1 years ago by Kevin Blighe 88k • written 6.1 years ago by limin201709 ▴ 10

score 1 · Answer 1 · 2018-04-20

Check the distribution of your data first by generating a histogram. If it looks like that typical 'bell' curve (binomial distribution), then use Euclidean distance. If not, then you may consider correlation dissimilarities via 1 minus Spearman correlation. You could also transform your data to the Z-scale, in which case it would most likely then represent a binomial curve and, in following, you could then use Euclidean distance. If your data is some other weird type of non-negative and/or ordinal data, then you may consider Manhattan or Canberra distance.

In terms of the linkage metric to use, you have more liberty to choose one metric over another, i.e., more liberty than you do for the distance metric. Ward's Linkage (ward.D2) usually makes a tree more interpretative (visually) than other metrics due to the way that it merges branches based on 'minimal variance'.

Kevin