Question: correlation matrix and working on signficant genes
gravatar for adR
3 months ago by
adR50 wrote:

Hi, I have gene expression data from two tissues of the same mouse model(Tissue A and Tissue B). I want to do a correlation test and need to work only on the significantly correlated genes between Tissue A and Tissue B. Finally, I would like to show a heatmap and correlation network graph. To give you an idea of how my data look like, here I stimulated with the following line of codes in R.


    mat = golub 
    rownames(mat) = paste0("G",1:nrow(mat))
        #split the samples up randomly
        m1 = mat[,seq(1,ncol(mat),2)] 
        m2 = mat[,seq(2,ncol(mat),2)] 
        test = rcorr(t(rbind(m1,m2))) 
n = nrow(m1) 
        ## Now to see correlation between the two tissues:
         results = melt(test$r[1:n,(n+1):(2*n)], ="pcc") %>% 
         left_join(melt(test$P[1:n,(n+1):(2*n)], ="p"),by=c("Var1","Var2"))
        ## to get the signficant(FDR) 
        sig = droplevels(results[p.adjust(results$p,"BH")<0.05,])  
        new.max <- results %>% filter(Var1 %in% sig$Var1 & Var2 %in% sig$Var2) %>%
         recast(Var1 ~ Var2,data=.,measure.var="pcc")

Now I have the significant genes which I would like to show by heatmap or network graph. However, the new matrix( new.max) is not anymore square, and I am not sure if I can show co-expressed genes in this form of a matrix. Would it be great to advise me on approaching this big matrix in a more meaningful way? I could show the whole matrix as a heatmap, but it is vast, and my computer is not handling it. That is why I restrict to significant genes only! Best, AD

co-expression corrlation • 249 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by adR50

In my experience gplots::heatmap2 function does not require an advanced computer. By default it will also cluster your genes, highlighting co-expression.

ADD REPLYlink modified 3 months ago • written 3 months ago by Alex Nesmelov190

@Alex, Thank you so much! I will plot the full matrix then using gplots::heatmap2(). However, do you recommend converting the similarity matrix which I am interested in (test$r[1:n,(n+1):(2*n)]) to be converted to the adjacency matrix before plotting as heatmap or in the graph?

If so ...actually I have tried using the function adjacency.fromSimilarity() in the WGCNA package and got error like Error in checkAdjMat(similarity, min, max) : adjacency is not numeric.

here is the line of codes I used.

heatmap_indices <- test$r[1:n,(n+1):(2*n)]
adj_matrix <- adjacency.fromSimilarity(heatmap_indices, power=12, type='signed')
          labRow=NA, labCol=NA, 
          trace='none', dendrogram='row',
          xlab='Gene', ylab='Gene',
          main='Similarity matrix',
'none', revC=TRUE)

Best, Amare

ADD REPLYlink written 3 months ago by adR50

I've seen adjacency heatmaps only for visualization of the whole WGCNA network when it is used with gene dendrograms, so I can't advise here. I can imagine that adjacency heatmap may consist almost solely of two bright colors since adjacency is an exponential function. Concerning the adjacency calculation, similarity should be a symmetric numerical matrix with values from -1 to 1. Of these requirements, heatmap_indices does not match only symmetry, but this is quite a problem for similarity matrix) Please, check heatmap_indices generation. By the way, how did you choose to set power=12?

ADD REPLYlink modified 3 months ago • written 3 months ago by Alex Nesmelov190

@Alex, Thanks again for your feedback! I don't have informatics and R knowledge. I am just learning by doing using my data. I chose power 12 by mistake; the default in WGCNA is 6. The heatmap_indices object(heatmap_indices <- test$r[1:n,(n+1):(2*n)]) represents the top-right block of the similarity matrix(test = rcorr(t(rbind(m1,m2)))). I chose this block to show the possible correlation between tissue A(m1) and tissue(m2). The other blocks, for example, top left is showing the self-correlation of m1. I am interested in the block where I can see the correlation between m1(Tissue A) and m2(Tissue B). My objective is exploring correlated genes of this block and show in a plot. Adjacency calculation seems impossible for this block as the block's max and min value is between -0.9 and + 0.9. I am not sure maybe I need to change the structure of the data from test = rcorr(t(rbind(m1,m2))) to test = rcorr(t(cbind(m1,m2))) Anyways if I am doing nonsense, please excuse me! Best, AD

ADD REPLYlink modified 3 months ago • written 3 months ago by adR50

Use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.

ADD REPLYlink written 3 months ago by GenoMax96k

Similarity matrix: I see in heatmap_indices that G1 to G2 correlation is 0.21425986 in first row and 0.11289293 in the first column. I think that the similarity matrix should be symmetrical, i.e. G1 to G2 correlation/similarity in two cells heatmap_indices[G1, G2] and heatmap_indices[G2, G1] should be the same value. WGCNA soft power: as far as I know, one can not choose this value arbitrarily or use a default value. Please check Consensus WGCNA: soft-threshold power selection (see section 6). Also, I think that (test = rcorr(t(rbind(m1,m2)))) does not actually produce the correlation of gene expression values between two tissues A and B, if your actual data are similar to your generated example (genes in rows, samples in columns in two matrices for tissues A and B). In your example, binding two matrices of gene expression just concatenes two vectors of expression values for a given gene in two tissues (i.e., (c(G1_expression in_tissueA, G2_expression in_tissueB)) and these values for all genes are passed to rcorr. I think that it actually should be rcorr(t(m1),t(m2)). Then you can adjust p-values for a half of this matrix (adjusting p-values for the whole matrix will deflate p-values, since it increases the number of comparisons twice) and filter out the non-significant values.

ADD REPLYlink modified 3 months ago • written 3 months ago by Alex Nesmelov190

@Alex, Thank you again for your answers! However, rcorr(t(m1),t(m2)) and rcorr(t(rbind(m1,m2)))) would give you the same result. Value differences you noticed in the heatmap_indices object is TRUE. This is because the heatmap_indices object does not represent the full correlation matrix rather the top right quadrate of the full matrix. Here simulated small size data for you to observe the difference.

m1 <- matrix(rnorm(50), nrow = 10)
row.names(m1) <- paste0("G_", 1:10)
colnames(m1) <- paste0("I_", 1:5)

m2 <- matrix(rnorm(50),nrow = 10)
row.names(m2) <- paste0("m.G_", 1:10) ## The same type and number of genes as in m1 but to make sure that they are from different tissue, I added prefex "m."
colnames(m2) <- paste0("I_", 1:5)

# Code 1
cor1 <- rcorr(t(m1),t(m2))
corR <- cor1$r ## full matrix
topright <- corR[1:10, 11:20]  ## top right block 
head(topright ,3)
# Code 1
cor2 <- rcorr(t(rbind(m1,m2))) ## the same result 
corR2 <- cor2$r ## full matrix
topright2 <- corR2 [1:10, 11:20]  ## top right block
head(topright2 ,3)

I am interested in the TOP RIGHT side of the full matrix because I believe the true correlation between Tissue A (m1) and Tissue B(m2) lies here(or in the BOTTOM LIFT).

ADD REPLYlink modified 3 months ago • written 3 months ago by adR50

Yes, I see, I missed the difference between rcorr and cor behaviors.

ADD REPLYlink written 3 months ago by Alex Nesmelov190

If the issue is just producing a large correlation matrix, then use bigcor:

ADD REPLYlink written 3 months ago by Kevin Blighe70k

Thank you so much Kevin! However, I may ask some tutorial on this correlation? Incase my question wasn't clear I might re-post it again or modifying it.

ADD REPLYlink written 3 months ago by adR50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2112 users visited in the last hour