6 months ago

Hi I have gene expression data from two different tissues of the same mice/organism. Let say tissue A and tissue B. The number and type of genes are similar in both tissues. I want to do a correlation analysis between genes of tissue A and Genes of tissue B. Here I created a simulation data and my code.

   A <- data.frame(rnorm(10000),
rnorm(10000),
rnorm(10000),
rnorm(10000),
rnorm(10000))
row.names(A) <- paste("G_", 1:10000)
colnames(A) <- paste("M_", 1:5)
set.seed(1)
B <- data.frame(rnorm(10000),
rnorm(10000),
rnorm(10000),
rnorm(10000),
rnorm(10000))
row.names(B) <- paste("G_", 1:10000)
colnames(B) <- paste("I_", 1:5)
cor.ge.AB <- cor(t(A),t(B))


The matrix looks big and I would like to take only genes that are highly correlated (like higher than 0.9). Does anyone help me with how I can do correlation if my code is wrong? Additionally, I would like to get only correlated genes(r >0.9) Best, Amare

RNA-Seq correlation r • 240 views
A few points:

1. You may want to use paste0 instead of paste to avoid white spaces.
2. You should also calculate p-values along with correlation coefficients. Then you can narrow by high confidence high correlation pairs.
3. The easiest way to filter your correlation matrix is to flatten it using reshape2::melt or tidyr and filter the result long-form matrix that would have gene1, gene2, corr_coeff as columns.