Question

geom_line with hclust data of expression matrix

0

Entering edit mode

6.6 years ago

Assa Yeroslaviz ★ 1.8k

I have an expression matrix of intensities (7216 x 100) I would like to plot using the geom_line() function of ggplot.

this is what I tried:

pcaHC <- hclust(dist(sample.mat), method = "ward.D2") # calculate the distances and cluster
pca_subclusters <- cutree(pcaHC, k=40) # create 40 different clusters
sample_file_df <- data.frame(sample.mat, "cluster" = factor(pca_subclusters)) # merge the clusters with the intensity matrix

the df looks like that:

> head(sample_file_df[,c(1:3,100:101)])
                X1        X2        X3  ...      X100 cluster
15S_rRNA  47.00252  52.46925  57.51065  ... 133.99373       1
21S_rRNA  11.61435  13.90566  12.74778  ... 113.34820       1
HRA1      72.86330  71.72579  71.66715  ...  94.78852       2
ICR1      55.72980  62.21363  53.49190  ...  68.34249       3
LSR1     202.86542 221.03463 221.87639  ... 307.33516       4
NME1     289.14436 289.17267 291.15432  ... 367.86647       4

Now I have the matrix of intensities with the cluster number merged into it.

I would like to plot the intensities using the geom_line() parameter of ggplot2. and using the facet() option to separate the data based on the clusters.

I know how to melt the data into form without the clusters.

bin <- colnames(sample_file_df[,1:100])
intensities <- t(sample_file_df[,1:100])
df <- data.frame(bin, intensities)
d.f2 <- melt(df[,1:10], id.vars = "bin")

But is there a way to include the information about the clusters in the melted table so that i will be able to separate them based on clusters?

my code:

example <- dput(head(sample_file_df[,c(1:3,101)]))
structure(list(X1 = c(47.0025219774636, 11.61435429513, 72.8633017362537, 
55.7297975392345, 202.865415753006, 289.14435756511), X2 = c(52.4692503895184, 
13.9056586769545, 71.7257899110431, 62.2136287826649, 221.034632464551, 
289.17266718698), X3 = c(57.5106531481446, 12.7477809541531, 
71.6671538520602, 53.4918969402706, 221.876393120142, 291.154317537268
), cluster = structure(c(1L, 1L, 2L, 3L, 4L, 4L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40"), class = "factor")), .Names = c("X1", 
"X2", "X3", "cluster"), row.names = c("15S_rRNA", "21S_rRNA", 
"HRA1", "ICR1", "LSR1", "NME1"), class = "data.frame")

bin <- colnames(example[,1:3])
intensities <- t(example[,1:3])
df <- data.frame(bin, intensities)
d.f2 <- melt(df, id.vars = "bin")
ggplot(d.f2, aes(bin, value, group = variable, colour = variable)) + geom_line()

Ideas would be appreciated. Thanks

ggplot hclust clustering • 1.8k views

ADD COMMENT • link 6.6 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

I have found out that I can merge the two table together based on the gene names and add the clusters, but is there a more efficient method?

d.f2.1 <- merge(d.f2, example, by.x = 2, by.y=0, all.x = TRUE)
ggplot(d.f2.1, aes(bin, value, group = variable, colour = variable)) + geom_line() + facet_grid(. ~ cluster)

enter image description here

ADD REPLY • link 6.6 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

If you want the clusters in the melted data frame, don't leave them out of the original data frame.

ADD REPLY • link 6.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

This doesn't work for me (AFAIK). The clusters are in a column. When I transpose the data to fit the structure I need, they will also become a row in the new matrix and I won't be able to melt them accordingly.

Or do I miss something?

ADD REPLY • link 6.6 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

Maybe:

   melt(sample_file_df, id.vars = c("bin", "cluster"))

ADD REPLY • link 6.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

bin is a column and cluster in this case would be a row in the data.frame. I don't see how to combine these info together.

ADD REPLY • link 6.6 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

Cluster is not a row according to your example of head(sample_file_df[,c(1:3,100:101)]) above. I didn't check what bin was. Replace it by the gene name column of sample_file_df. The idea is that you can give more than one column to melt.

ADD REPLY • link 6.6 years ago by Jean-Karim Heriche 27k