Question: Filtering gene expression correlation matrix in R
0
gravatar for mjoyraj
3.2 years ago by
mjoyraj50
Taiwan
mjoyraj50 wrote:

I have a 120 by 120 correlation matrix (Correlation_mat) obtained by comparing FPKM gene expression data of 120 genes at different time-series with cor test of R. I want to visualize the correlation results in histogram and then want to filter the top 1% correlations from the matrix. Then I want to divide the top 1% correlations into cluster based on mutual correlations. Can anybody help me with the R script?

rna-seq R • 5.3k views
ADD COMMENTlink modified 3.2 years ago by newlife.well.201410 • written 3.2 years ago by mjoyraj50
2

What have you tried? It unlikely that we'll just supply you with the R code if you can't demonstrate having actually made an attempt yourself.

ADD REPLYlink written 3.2 years ago by Devon Ryan74k
1
gravatar for Sam
3.2 years ago by
Sam2.1k
London
Sam2.1k wrote:

It might be a good idea if you just edit your original post.

To do what you want, the simplest way to do will be

hclust(as.dist(1-abs(Correlation_mat))) 

which will perform hierarchical clustering for you. The hclust is the R function for the clustering where as.dist means that your matrix is the distance matrix (therefore the 1-)

Some other posts of yours suggests that you are trying to do something along the line of WGCNA, you can always follow their tutorial

However, with only 120 genes, I am not sure if you can find any cluster

 

P.S For your original method, you will very likely only obtain the diagonal of the correlation matrix, which should always be 1 (as you can see from your output).

ADD COMMENTlink written 3.2 years ago by Sam2.1k
0
gravatar for mjoyraj
3.2 years ago by
mjoyraj50
Taiwan
mjoyraj50 wrote:

1) I have generated correlation matrix with the below scripts

#Data import

Rdata <- read.table(file.choose(), header=TRUE, sep=",")

#Generating correlation matrix with the 'Rdata'

cor(Rdata, method="pearson")

Correlation_mat <- cor(Rdata)

 

2) Then I have filtered the correlations based on a cutoff correlation coefficient (r) value

#Filtering based on Correlation value

n <- ncol(Correlation_mat)

cmat <- col(Correlation_mat)

ind <- order(-cmat, Correlation_mat, decreasing = TRUE) - (n * cmat - n) 

dim(ind) <- dim(Correlation_mat)

colnames(ind) <- colnames(Correlation_mat)

out <- cbind(ID = c(col(ind)), ID2 = c(ind))

as.data.frame(cbind(out, cor = Correlation_mat[out]))

Final<- as.data.frame(cbind(out, cor = Correlation_mat[out]))

 

f=Final

for (i in 1:81) {

  for (j in 1:2){f[i,j]=row.names(Correlation_mat)[Final[i,j]]  }

  }

f[f[,3]>.8,]

a=f[f[,3]>.8,]

 

Therefore, in the file ‘a’ I have filtered correlation value like this

row.names

ID

ID2

cor

1

ENSGALG00000001744

ENSGALG00000001744

1.0000000

 

10

ENSGALG00000028599

ENSGALG00000028599

1.0000000

 

19

ENSGALG00000024138

ENSGALG00000024138

1.0000000

 

20

ENSGALG00000024138

Chr25_Ktn4

0.9668402

 

21

ENSGALG00000024138

ENSGALG00000016507

0.8607992

 

28

ENSGALG00000024029

ENSGALG00000024029

1.0000000

 

37

ENSGALG00000016507

ENSGALG00000016507

1.0000000

 

38

ENSGALG00000016507

ENSGALG00000024138

0.8607992

 

46

ENSGALG00000026609

ENSGALG00000026609

1.0000000

 

55

ENSGALG00000022277

ENSGALG00000022277

1.0000000

 

64

ENSGALG00000027059

ENSGALG00000027059

1.0000000

 

73

Chr25_Ktn4

Chr25_Ktn4

1.0000000

 

74

Chr25_Ktn4

ENSGALG00000024138

0.96684

 

 

I want to generate cluster with the above correlations. Actually my idea is pairs which are mutually correlated will be in the same cluster. Next I want to do GO enrichment analysis. Is there any suggestion for cluster generalization (visualization) and then GO enrichment analysis on the clusters

 

 

ADD COMMENTlink written 3.2 years ago by mjoyraj50
0
gravatar for mjoyraj
3.2 years ago by
mjoyraj50
Taiwan
mjoyraj50 wrote:

Thanks. How to visualize the 'hclust' results...?? It gives values (list of 7) 

ADD COMMENTlink written 3.2 years ago by mjoyraj50

Please use the add comment feature

You can use heatmap.2 from library gplots

ADD REPLYlink written 3.2 years ago by Sam2.1k
0
gravatar for mjoyraj
3.2 years ago by
mjoyraj50
Taiwan
mjoyraj50 wrote:

I have used the below script to generate the plots and heatmap

hclust(as.dist(1-abs(Correlation_mat)))
hc <- hclust(as.dist(1-abs(Correlation_mat)))
plot(hc)

IN GPLOTS

install.packages("gplots")
library(gplots)
heatmap.2(Correlation_mat, main="Hierarchical Cluster", dendrogram="column",trace="none",col=greenred(10))

plot(hc) is definitely based on hierarchical clustering whether heatmap.2 also made hierarchical clustering as default..?? 

ADD COMMENTlink written 3.2 years ago by mjoyraj50
1

You'll want to use something more like heatmap.2(1-abs(Correlation_mat), distfun=as.dist, trace="none"). That function calls hclust internally and this should produce the same clustering that you got with hc.

ADD REPLYlink written 3.2 years ago by Devon Ryan74k

Thank you very much Devon. It worked like a charm. 

Well the correlation result produced is not having P-values. Is there any way to incorporate P-value. discard insignificant results from the correlation matrix and then go for clustering with the significant correlations..??

ADD REPLYlink written 3.2 years ago by mjoyraj50

You'd need to define significant correlation first and then filter rows and columns accordingly to keep the matrix symmetric (it's not like you can just filter values out). With only a 120x120 matrix, this seems like more trouble than it's worth.

ADD REPLYlink written 3.2 years ago by Devon Ryan74k

Yes you are right but I am not expert enough in R to do this. I started R two months back and this task is really tricky. I tried the following script below to define significant correlations and made a plot however, I could not do the same for filtering the correlation results in matrix

library("corrplot")
Correlation_mat <- cor(Rdata, method="pearson")

cor.mtest <- function(mat, conf.level = 0.95) {
  mat <- as.matrix(mat)
  n <- ncol(mat)
  p.mat <- lowCI.mat <- uppCI.mat <- matrix(NA, n, n)
  diag(p.mat) <- 0
  diag(lowCI.mat) <- diag(uppCI.mat) <- 1
  for (i in 1:(n - 1)) {
    for (j in (i + 1):n) {
      tmp <- cor.test(mat[, i], mat[, j], conf.level = conf.level)
      p.mat[i, j] <- p.mat[j, i] <- tmp$p.value
      lowCI.mat[i, j] <- lowCI.mat[j, i] <- tmp$conf.int[1]
      uppCI.mat[i, j] <- uppCI.mat[j, i] <- tmp$conf.int[2]
    }
  }
  return(list(p.mat, lowCI.mat, uppCI.mat))
}
res1 <- cor.mtest(Rdata, 0.95)

corrplot(Correlation_mat, p.mat = res1[[1]], sig.level = 0.2)

 

ADD REPLYlink written 3.2 years ago by mjoyraj50

The general idea is to ask, "are any of the p-values in a i row and column i significant?". If not, you can remove that row and column. Since the matrix is symmetric, you can just test the rows with apply(p.mat, 1, function(x) any(x > threshold)), where threshold is your p-value cutoff.

ADD REPLYlink written 3.2 years ago by Devon Ryan74k

What is X here...??

ADD REPLYlink written 3.2 years ago by mjoyraj50

It's like mat in your cor.mtest function. In this context, it'll hold the vector of values for a given row. BTW, try to avoid for loops in functional languages like R (I realize this will seem strange if you're used to things like perl/python/C/etc.). They're really slow.

ADD REPLYlink written 3.2 years ago by Devon Ryan74k

Devon, thanks for your suggestion. Can you elaborate a little more in the form of script, because when I tried it showed p.mat not found. So can you write the full script or else can you say where in my script it should be added...??

ADD REPLYlink written 3.2 years ago by mjoyraj50

I'll see if I have time to slap something together today. I'm a bit busy at the moment, so no promises.

ADD REPLYlink written 3.2 years ago by Devon Ryan74k

If I read the script correctly, it should be:

res1$p.mat instead of just p.mat because it should be the element within the object

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Sam2.1k

Its showing..

Error in apply(res1$p.mat, 1, function(x) any(x > 0.2)) : 
  dim(X) must have a positive length

ADD REPLYlink written 3.2 years ago by mjoyraj50

Well I found a probable solution of screening significant correlations with p-values. res[[1]] gives the p-values of the correlations in a symmetric manner. I used 'cellnote=Pval' (where Pval is the p-value) function of heatmap.2 to add the p-values in the cells of heatmap. So in the heatmap cluster if any cell corresponding the genes do not have significant p-value I will not consider it in the cluster.

ADD REPLYlink written 3.2 years ago by mjoyraj50
0
gravatar for newlife.well.2014
3.2 years ago by
United States
newlife.well.201410 wrote:

Please check ''caret" package in R and use the following function:

findCorrelation(x, cutoff = .90, verbose = FALSE)

Hop this help.

 

ADD COMMENTlink written 3.2 years ago by newlife.well.201410

Thanks for your suggestion, whether it will filter correlations considering only significant P-values..??

ADD REPLYlink written 3.2 years ago by mjoyraj50

For this function, x is a correlation matrix. Maybe you can construct your correlation matrix based on P-values. In addition, you can check the source code of caret, which is very helpful.

ADD REPLYlink written 3.2 years ago by newlife.well.201410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 620 users visited in the last hour