I have a 120 by 120 correlation matrix (Correlation_mat) obtained by comparing FPKM gene expression data of 120 genes at different timeseries with cor test of R. I want to visualize the correlation results in histogram and then want to filter the top 1% correlations from the matrix. Then I want to divide the top 1% correlations into cluster based on mutual correlations. Can anybody help me with the R script?
It might be a good idea if you just edit your original post.
To do what you want, the simplest way to do will be
hclust(as.dist(1abs(Correlation_mat)))
which will perform hierarchical clustering for you. The hclust is the R function for the clustering where as.dist means that your matrix is the distance matrix (therefore the 1)
Some other posts of yours suggests that you are trying to do something along the line of WGCNA, you can always follow their tutorial
However, with only 120 genes, I am not sure if you can find any cluster
P.S For your original method, you will very likely only obtain the diagonal of the correlation matrix, which should always be 1 (as you can see from your output).
1) I have generated correlation matrix with the below scripts
#Data import
Rdata < read.table(file.choose(), header=TRUE, sep=",")
#Generating correlation matrix with the 'Rdata'
cor(Rdata, method="pearson")
Correlation_mat < cor(Rdata)
2) Then I have filtered the correlations based on a cutoff correlation coefficient (r) value
#Filtering based on Correlation value
n < ncol(Correlation_mat)
cmat < col(Correlation_mat)
ind < order(cmat, Correlation_mat, decreasing = TRUE)  (n * cmat  n)
dim(ind) < dim(Correlation_mat)
colnames(ind) < colnames(Correlation_mat)
out < cbind(ID = c(col(ind)), ID2 = c(ind))
as.data.frame(cbind(out, cor = Correlation_mat[out]))
Final< as.data.frame(cbind(out, cor = Correlation_mat[out]))
f=Final
for (i in 1:81) {
for (j in 1:2){f[i,j]=row.names(Correlation_mat)[Final[i,j]] }
}
f[f[,3]>.8,]
a=f[f[,3]>.8,]
Therefore, in the file ‘a’ I have filtered correlation value like this
row.names 
ID 
ID2 
cor 


1 
ENSGALG00000001744 
ENSGALG00000001744 
1.0000000 

10 
ENSGALG00000028599 
ENSGALG00000028599 
1.0000000 

19 
ENSGALG00000024138 
ENSGALG00000024138 
1.0000000 

20 
ENSGALG00000024138 
Chr25_Ktn4 
0.9668402 

21 
ENSGALG00000024138 
ENSGALG00000016507 
0.8607992 

28 
ENSGALG00000024029 
ENSGALG00000024029 
1.0000000 

37 
ENSGALG00000016507 
ENSGALG00000016507 
1.0000000 

38 
ENSGALG00000016507 
ENSGALG00000024138 
0.8607992 

46 
ENSGALG00000026609 
ENSGALG00000026609 
1.0000000 

55 
ENSGALG00000022277 
ENSGALG00000022277 
1.0000000 

64 
ENSGALG00000027059 
ENSGALG00000027059 
1.0000000 

73 
Chr25_Ktn4 
Chr25_Ktn4 
1.0000000 

74 
Chr25_Ktn4 
ENSGALG00000024138 
0.96684 

I want to generate cluster with the above correlations. Actually my idea is pairs which are mutually correlated will be in the same cluster. Next I want to do GO enrichment analysis. Is there any suggestion for cluster generalization (visualization) and then GO enrichment analysis on the clusters
Thanks. How to visualize the 'hclust' results...?? It gives values (list of 7)
I have used the below script to generate the plots and heatmap
hclust(as.dist(1abs(Correlation_mat)))
hc < hclust(as.dist(1abs(Correlation_mat)))
plot(hc)
IN GPLOTS
install.packages("gplots")
library(gplots)
heatmap.2(Correlation_mat, main="Hierarchical Cluster", dendrogram="column",trace="none",col=greenred(10))
plot(hc) is definitely based on hierarchical clustering whether heatmap.2 also made hierarchical clustering as default..??
You'll want to use something more like heatmap.2(1abs(Correlation_mat), distfun=as.dist, trace="none")
. That function calls hclust
internally and this should produce the same clustering that you got with hc
.
Thank you very much Devon. It worked like a charm.
Well the correlation result produced is not having Pvalues. Is there any way to incorporate Pvalue. discard insignificant results from the correlation matrix and then go for clustering with the significant correlations..??
You'd need to define significant correlation first and then filter rows and columns accordingly to keep the matrix symmetric (it's not like you can just filter values out). With only a 120x120 matrix, this seems like more trouble than it's worth.
Yes you are right but I am not expert enough in R to do this. I started R two months back and this task is really tricky. I tried the following script below to define significant correlations and made a plot however, I could not do the same for filtering the correlation results in matrix
library("corrplot")
Correlation_mat < cor(Rdata, method="pearson")
cor.mtest < function(mat, conf.level = 0.95) {
mat < as.matrix(mat)
n < ncol(mat)
p.mat < lowCI.mat < uppCI.mat < matrix(NA, n, n)
diag(p.mat) < 0
diag(lowCI.mat) < diag(uppCI.mat) < 1
for (i in 1:(n  1)) {
for (j in (i + 1):n) {
tmp < cor.test(mat[, i], mat[, j], conf.level = conf.level)
p.mat[i, j] < p.mat[j, i] < tmp$p.value
lowCI.mat[i, j] < lowCI.mat[j, i] < tmp$conf.int[1]
uppCI.mat[i, j] < uppCI.mat[j, i] < tmp$conf.int[2]
}
}
return(list(p.mat, lowCI.mat, uppCI.mat))
}
res1 < cor.mtest(Rdata, 0.95)
corrplot(Correlation_mat, p.mat = res1[[1]], sig.level = 0.2)
The general idea is to ask, "are any of the pvalues in a i row and column i significant?". If not, you can remove that row and column. Since the matrix is symmetric, you can just test the rows with apply(p.mat, 1, function(x) any(x > threshold))
, where threshold
is your pvalue cutoff.
It's like mat
in your cor.mtest
function. In this context, it'll hold the vector of values for a given row. BTW, try to avoid for
loops in functional languages like R (I realize this will seem strange if you're used to things like perl/python/C/etc.). They're really slow.
Devon, thanks for your suggestion. Can you elaborate a little more in the form of script, because when I tried it showed p.mat not found. So can you write the full script or else can you say where in my script it should be added...??
I'll see if I have time to slap something together today. I'm a bit busy at the moment, so no promises.
If I read the script correctly, it should be:
res1$p.mat
instead of just p.mat
because it should be the element within the object
Its showing..
Error in apply(res1$p.mat, 1, function(x) any(x > 0.2)) :
dim(X) must have a positive length
Well I found a probable solution of screening significant correlations with pvalues. res[[1]] gives the pvalues of the correlations in a symmetric manner. I used 'cellnote=Pval' (where Pval is the pvalue) function of heatmap.2 to add the pvalues in the cells of heatmap. So in the heatmap cluster if any cell corresponding the genes do not have significant pvalue I will not consider it in the cluster.
Please check ''caret" package in R and use the following function:
findCorrelation(x, cutoff = .90, verbose = FALSE)
Hop this help.
What have you tried? It unlikely that we'll just supply you with the R code if you can't demonstrate having actually made an attempt yourself.