Question

Correlated/Coexpressed genes , given the name of a gene.

4

Entering edit mode

9.2 years ago

Deepak Tanwar ★ 4.2k

Hi,

Given,I have a gene expression dataset. I want to find out, all the genes that are highly correlated with the given gene/ have the same expression pattern.

Example, if there would be an R package, it should take one input as expression data matrix and another input as gene of interest and should provide an output of a list of genes, that could have same expression pattern or are highly correlated.

I know that I could write a function in R regarding this and filter the genes on the basis of their correlation values and take into account the cor value more than 0.5. But, it takes so much of time, when you have a data with more than 20,000 rows and 1,000 columns.

correlation R • 3.7k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Deepak Tanwar ★ 4.2k

2

Entering edit mode

Could you share your code to see why it is very slow? I gave it a try on my computer with a toy example (20'000 rows and 1'000 columns) and it only took few seconds. You should be able to do this without any problem in R.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Philippe ★ 1.9k

2

Entering edit mode

high_correlation_pval_table<- NULL
# iterate through all genes
for(i in 1:nrow(log2_BRCA)){
  # correlation
  cor <- cor.test(as.numeric(log2_BRCA[which(substr(rownames(log2_BRCA), start = 1, stop = 6) == "SLC5A5"),]), 
                  as.numeric(log2_BRCA[i,]), method = "spearman", alternative = "two.sided")
  # conditions
  if(abs(as.numeric(cor$estimate)) < 0.5 | abs(as.numeric(cor$estimate)) == 1 | is.na(abs(as.numeric(cor$estimate)))){
    next  
  }
  else{
    high_correlation_pval_table <- rbind(
                                     high_correlation_pval_table,
                                     data.frame(
                                           Gene = "SLC5A5",
                                           Cor_Gene = rownames(log2_BRCA[i,]), 
                                           Correlation_Value = as.numeric(cor$estimate), 
                                           p_value = as.numeric(cor$p.value), 
                                           stringsAsFactors = F,
                                           check.names = F))
  }

}

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Deepak Tanwar ★ 4.2k

0

Entering edit mode

Deepak, may you help me as already

Suppose that I have already downloaded GSE63706 and normalized that and I have a normalized text file now. and I have also a list of probesets (a text file of my interest probesets) from this array, I want to have a heat map showing the expression pattern of my interest probesets in this array, for example in this array I have 4 varieties and different tissues (rind and flesh) and phases (0,10,20,30,40 and 50 days after harvesting).

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 8.8 years ago by zizigolu ★ 4.3k

2

Entering edit mode

Heatmap is not a problem at all. There is a R package called pheatmap. There is a very easy way to show the above mentioned groups with the heatmap. These are called as the annotations of a heatmap. Check this out: How Do I Draw A Heatmap In R With Both A Color Key And Multiple Color Side Bars?

There are codes as well. You could use any heatmap packages, there a many. pheatmap, heatmap, heatmap3, heatmap.3, heatmap.2, whatever you prefer. Read the above mentioned Biostars link properly. Tryout the example codes properly and then edit them according to your data.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 8.8 years ago by Deepak Tanwar ★ 4.2k

1

Entering edit mode

Thank you very much

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 8.8 years ago by zizigolu ★ 4.3k

Ram · Accepted Answer · 2015-02-17

Using the function below is pretty fast using 1500 columns and 30k rows

mat <- replicate(1500, rnorm(30000)) 
gene <- mat[sample(1:30000,1),]

system.time({
resp <- c()
est <- c()
for (i in 1:30000){
  x <- cor.test(gene, mat[i,])
  if (abs(x$estimate[[1]]) > 0.5){
    resp <- c(resp, x$p.value)
    est <- c(est, x$estimate[[1]])
  }
}
})

 user  system elapsed 
   7.69    0.02    7.71