Question: geom_line with hclust data of expression matrix
0
gravatar for Assa Yeroslaviz
24 months ago by
Assa Yeroslaviz1.2k
Munich
Assa Yeroslaviz1.2k wrote:

I have an expression matrix of intensities (7216 x 100) I would like to plot using the geom_line() function of ggplot.

this is what I tried:

pcaHC <- hclust(dist(sample.mat), method = "ward.D2") # calculate the distances and cluster
pca_subclusters <- cutree(pcaHC, k=40) # create 40 different clusters
sample_file_df <- data.frame(sample.mat, "cluster" = factor(pca_subclusters)) # merge the clusters with the intensity matrix

the df looks like that:

> head(sample_file_df[,c(1:3,100:101)])
                X1        X2        X3  ...      X100 cluster
15S_rRNA  47.00252  52.46925  57.51065  ... 133.99373       1
21S_rRNA  11.61435  13.90566  12.74778  ... 113.34820       1
HRA1      72.86330  71.72579  71.66715  ...  94.78852       2
ICR1      55.72980  62.21363  53.49190  ...  68.34249       3
LSR1     202.86542 221.03463 221.87639  ... 307.33516       4
NME1     289.14436 289.17267 291.15432  ... 367.86647       4

Now I have the matrix of intensities with the cluster number merged into it.

I would like to plot the intensities using the geom_line() parameter of ggplot2. and using the facet() option to separate the data based on the clusters.

I know how to melt the data into form without the clusters.

bin <- colnames(sample_file_df[,1:100])
intensities <- t(sample_file_df[,1:100])
df <- data.frame(bin, intensities)
d.f2 <- melt(df[,1:10], id.vars = "bin")

But is there a way to include the information about the clusters in the melted table so that i will be able to separate them based on clusters?

my code:

example <- dput(head(sample_file_df[,c(1:3,101)]))
structure(list(X1 = c(47.0025219774636, 11.61435429513, 72.8633017362537, 
55.7297975392345, 202.865415753006, 289.14435756511), X2 = c(52.4692503895184, 
13.9056586769545, 71.7257899110431, 62.2136287826649, 221.034632464551, 
289.17266718698), X3 = c(57.5106531481446, 12.7477809541531, 
71.6671538520602, 53.4918969402706, 221.876393120142, 291.154317537268
), cluster = structure(c(1L, 1L, 2L, 3L, 4L, 4L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40"), class = "factor")), .Names = c("X1", 
"X2", "X3", "cluster"), row.names = c("15S_rRNA", "21S_rRNA", 
"HRA1", "ICR1", "LSR1", "NME1"), class = "data.frame")

bin <- colnames(example[,1:3])
intensities <- t(example[,1:3])
df <- data.frame(bin, intensities)
d.f2 <- melt(df, id.vars = "bin")
ggplot(d.f2, aes(bin, value, group = variable, colour = variable)) + geom_line()

Ideas would be appreciated. Thanks

clustering hclust ggplot • 782 views
ADD COMMENTlink written 24 months ago by Assa Yeroslaviz1.2k

I have found out that I can merge the two table together based on the gene names and add the clusters, but is there a more efficient method?

d.f2.1 <- merge(d.f2, example, by.x = 2, by.y=0, all.x = TRUE)
ggplot(d.f2.1, aes(bin, value, group = variable, colour = variable)) + geom_line() + facet_grid(. ~ cluster)

enter image description here

ADD REPLYlink modified 24 months ago • written 24 months ago by Assa Yeroslaviz1.2k

If you want the clusters in the melted data frame, don't leave them out of the original data frame.

ADD REPLYlink written 24 months ago by Jean-Karim Heriche20k

This doesn't work for me (AFAIK). The clusters are in a column. When I transpose the data to fit the structure I need, they will also become a row in the new matrix and I won't be able to melt them accordingly.

Or do I miss something?

ADD REPLYlink written 24 months ago by Assa Yeroslaviz1.2k

Maybe:

   melt(sample_file_df, id.vars = c("bin", "cluster"))
ADD REPLYlink modified 24 months ago • written 24 months ago by Jean-Karim Heriche20k

bin is a column and cluster in this case would be a row in the data.frame. I don't see how to combine these info together.

ADD REPLYlink written 24 months ago by Assa Yeroslaviz1.2k

Cluster is not a row according to your example of head(sample_file_df[,c(1:3,100:101)]) above. I didn't check what bin was. Replace it by the gene name column of sample_file_df. The idea is that you can give more than one column to melt.

ADD REPLYlink written 24 months ago by Jean-Karim Heriche20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 917 users visited in the last hour