Question

correlation matrix and working on signficant genes

0

Entering edit mode

3.5 years ago

adR ▴ 120

Hi, I have gene expression data from two tissues of the same mouse model(Tissue A and Tissue B). I want to do a correlation test and need to work only on the significantly correlated genes between Tissue A and Tissue B. Finally, I would like to show a heatmap and correlation network graph. To give you an idea of how my data look like, here I stimulated with the following line of codes in R.

        llibrary(Hmisc) 
        library(multtest) 
        library(reshape2)
        library(dplyr)
        data(golub)

    mat = golub 
    rownames(mat) = paste0("G",1:nrow(mat))
        #split the samples up randomly
        m1 = mat[,seq(1,ncol(mat),2)] 
        m2 = mat[,seq(2,ncol(mat),2)] 
        test = rcorr(t(rbind(m1,m2))) 
n = nrow(m1) 
        ## Now to see correlation between the two tissues:
         results = melt(test$r[1:n,(n+1):(2*n)],value.name ="pcc") %>% 
         left_join(melt(test$P[1:n,(n+1):(2*n)],value.name ="p"),by=c("Var1","Var2"))
        ## to get the signficant(FDR) 
        sig = droplevels(results[p.adjust(results$p,"BH")<0.05,])  
        new.max <- results %>% filter(Var1 %in% sig$Var1 & Var2 %in% sig$Var2) %>%
         recast(Var1 ~ Var2,data=.,measure.var="pcc")

Now I have the significant genes which I would like to show by heatmap or network graph. However, the new matrix( new.max) is not anymore square, and I am not sure if I can show co-expressed genes in this form of a matrix. Would it be great to advise me on approaching this big matrix in a more meaningful way? I could show the whole matrix as a heatmap, but it is vast, and my computer is not handling it. That is why I restrict to significant genes only! Best, AD

co-expression corrlation • 1.7k views

ADD COMMENT • link 3.5 years ago by adR ▴ 120

0

Entering edit mode

In my experience gplots::heatmap2 function does not require an advanced computer. By default it will also cluster your genes, highlighting co-expression.

ADD REPLY • link 3.5 years ago by Alex Nesmelov ▴ 200

0

Entering edit mode

@Alex, Thank you so much! I will plot the full matrix then using gplots::heatmap2(). However, do you recommend converting the similarity matrix which I am interested in (test$r[1:n,(n+1):(2*n)]) to be converted to the adjacency matrix before plotting as heatmap or in the graph?

If so ...actually I have tried using the function adjacency.fromSimilarity() in the WGCNA package and got error like Error in checkAdjMat(similarity, min, max) : adjacency is not numeric.

here is the line of codes I used.

library(WGCNA)
heatmap_indices <- test$r[1:n,(n+1):(2*n)]
adj_matrix <- adjacency.fromSimilarity(heatmap_indices, power=12, type='signed')
heatmap.2(heatmap_indices,
          col=redgreen(75),
          labRow=NA, labCol=NA, 
          trace='none', dendrogram='row',
          xlab='Gene', ylab='Gene',
          main='Similarity matrix',
          density.info='none', revC=TRUE)

Best, Amare

ADD REPLY • link 3.5 years ago by adR ▴ 120

0

Entering edit mode

I've seen adjacency heatmaps only for visualization of the whole WGCNA network when it is used with gene dendrograms, so I can't advise here. I can imagine that adjacency heatmap may consist almost solely of two bright colors since adjacency is an exponential function. Concerning the adjacency calculation, similarity should be a symmetric numerical matrix with values from -1 to 1. Of these requirements, heatmap_indices does not match only symmetry, but this is quite a problem for similarity matrix) Please, check heatmap_indices generation. By the way, how did you choose to set power=12?

ADD REPLY • link 3.5 years ago by Alex Nesmelov ▴ 200

0

Entering edit mode

@Alex, Thanks again for your feedback! I don't have informatics and R knowledge. I am just learning by doing using my data. I chose power 12 by mistake; the default in WGCNA is 6. The heatmap_indices object(heatmap_indices <- test$r[1:n,(n+1):(2*n)]) represents the top-right block of the similarity matrix(test = rcorr(t(rbind(m1,m2)))). I chose this block to show the possible correlation between tissue A(m1) and tissue(m2). The other blocks, for example, top left is showing the self-correlation of m1. I am interested in the block where I can see the correlation between m1(Tissue A) and m2(Tissue B). My objective is exploring correlated genes of this block and show in a plot. Adjacency calculation seems impossible for this block as the block's max and min value is between -0.9 and + 0.9. I am not sure maybe I need to change the structure of the data from test = rcorr(t(rbind(m1,m2))) to test = rcorr(t(cbind(m1,m2))) Anyways if I am doing nonsense, please excuse me! Best, AD

ADD REPLY • link 3.5 years ago by adR ▴ 120

0

Entering edit mode

Use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.

ADD REPLY • link 3.5 years ago by GenoMax 141k

0

Entering edit mode

Similarity matrix: I see in heatmap_indices that G1 to G2 correlation is 0.21425986 in first row and 0.11289293 in the first column. I think that the similarity matrix should be symmetrical, i.e. G1 to G2 correlation/similarity in two cells heatmap_indices[G1, G2] and heatmap_indices[G2, G1] should be the same value. WGCNA soft power: as far as I know, one can not choose this value arbitrarily or use a default value. Please check Consensus WGCNA: soft-threshold power selection https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html (see section 6). Also, I think that (test = rcorr(t(rbind(m1,m2)))) does not actually produce the correlation of gene expression values between two tissues A and B, if your actual data are similar to your generated example (genes in rows, samples in columns in two matrices for tissues A and B). In your example, binding two matrices of gene expression just concatenes two vectors of expression values for a given gene in two tissues (i.e., (c(G1_expression in_tissueA, G2_expression in_tissueB)) and these values for all genes are passed to rcorr. I think that it actually should be rcorr(t(m1),t(m2)). Then you can adjust p-values for a half of this matrix (adjusting p-values for the whole matrix will deflate p-values, since it increases the number of comparisons twice) and filter out the non-significant values.

ADD REPLY • link 3.5 years ago by Alex Nesmelov ▴ 200

0

Entering edit mode

@Alex, Thank you again for your answers! However, rcorr(t(m1),t(m2)) and rcorr(t(rbind(m1,m2)))) would give you the same result. Value differences you noticed in the heatmap_indices object is TRUE. This is because the heatmap_indices object does not represent the full correlation matrix rather the top right quadrate of the full matrix. Here simulated small size data for you to observe the difference.

set.seed(1)
m1 <- matrix(rnorm(50), nrow = 10)
row.names(m1) <- paste0("G_", 1:10)
colnames(m1) <- paste0("I_", 1:5)

set.seed(2)
m2 <- matrix(rnorm(50),nrow = 10)
row.names(m2) <- paste0("m.G_", 1:10) ## The same type and number of genes as in m1 but to make sure that they are from different tissue, I added prefex "m."
colnames(m2) <- paste0("I_", 1:5)

# Code 1
cor1 <- rcorr(t(m1),t(m2))
corR <- cor1$r ## full matrix
head(corR,3)
topright <- corR[1:10, 11:20]  ## top right block 
head(topright ,3)
# Code 1
cor2 <- rcorr(t(rbind(m1,m2))) ## the same result 
corR2 <- cor2$r ## full matrix
head(corR2,3)
topright2 <- corR2 [1:10, 11:20]  ## top right block
head(topright2 ,3)

I am interested in the TOP RIGHT side of the full matrix because I believe the true correlation between Tissue A (m1) and Tissue B(m2) lies here(or in the BOTTOM LIFT).

ADD REPLY • link 3.5 years ago by adR ▴ 120

0

Entering edit mode

Yes, I see, I missed the difference between rcorr and cor behaviors.

ADD REPLY • link 3.5 years ago by Alex Nesmelov ▴ 200

0

Entering edit mode

If the issue is just producing a large correlation matrix, then use bigcor: https://www.rdocumentation.org/packages/propagate/versions/1.0-6/topics/bigcor

ADD REPLY • link 3.5 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you so much Kevin! However, I may ask some tutorial on this correlation? Incase my question wasn't clear I might re-post it again or modifying it.

ADD REPLY • link 3.5 years ago by adR ▴ 120