Question: R script help to take out positive or correlation greater than a threshold
0
3.5 years ago by
krushnach80870
krushnach80870 wrote:

I have matrix file which is basically a spearman correlation matrix between genes across various cell type. So now Im trying to find out which set of genes or group of genes whose correlation value is lets say greater than 0.6 if I set that as my threshold. How can I do that? I'm posting a subset of my data. It's a 502 x 502 matrix.

``````      ACTL6B     ACTR5   ACTR6
ACTL6B  1        0.6        -0.4
ACTR5   0.4        1        -0.3
ACTR6  -0.4      -0.3         1
``````

So I don't want correlation between same set of genes which would be 1. I want other comparison. Like, lets say, ACTL6B and ACTR5 whose correlation is 0.6. I would like to keep those values and genes.

Any help or suggestion would be highly appreciated

R • 4.2k views
modified 3.5 years ago by cpad011214k • written 3.5 years ago by krushnach80870
3

This is not a bioinformatics question but rather an R programming question. Anyway try something along the line of

``````correlations[abs(correlations<0.6] <- NA
``````

but next time post on StackOverflow.

1

Solution provided above by Jean-Karim Heriche is better and shorter :)

+1. Sorry, I missed it in the first instance.

4
3.5 years ago by

in R:

My data:

``````   > test
ACTL6B ACTR5 ACTR6
ACTL6B    1.0   0.6  -0.4
ACTR5     0.4   1.0  -0.3
ACTR6    -0.4  -0.3   1.0
``````

Code :

``````> as.data.frame(apply(test, 2, function(x) ifelse (abs(x) >=0.6,x,"NA")))
ACTL6B ACTR5 ACTR6
ACTL6B        1      0.6    NA
ACTR5          NA     1     NA
ACTR6          NA     NA     1
``````
1

Nice idea, though you would need 2 as margin, else the results will be transposed (as you see in your example)

``````as.data.frame(apply(test, 2, function(x) ifelse (abs(x) >=0.6,x,"NA")))
``````

A simpler solution is to just use sapply

``````as.data.frame(sapply(test, function(x) ifelse (abs(x) >=0.6,x,"NA")))
``````
1

Thanks. Code updated. sapply for above code, doesn't save/print row names.

sapply doesn't save/print row names.

It does!

3
3.5 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

You could use something like Python for this:

``````#!/usr/bin/env python

import sys

ctr = 0
genes = []
map = {}
for line in sys.stdin:
elems = [x for x in line.strip().split()]
if ctr == 0:
genes = elems[1:]
for gene in genes:
map[gene] = {}
else:
paired_gene = elems[0]
scores = [float(x) for x in elems[1:]]
for idx, score in enumerate(scores):
gene = genes[idx]
map[gene][paired_gene] = score
map[paired_gene][gene] = score
ctr += 1
for gene in genes:
for paired_gene in genes:
if gene != paired_gene and map[gene][paired_gene] >= 0.6:
sys.stdout.write("%s\t%s\t%f\n" % (gene, paired_gene, map[gene][paired_gene]))
``````

I did figure it out in R , but this looks better i will try your python code..

3
3.5 years ago by
Santosh Anand5.2k
Santosh Anand5.2k wrote:
``````d = data.frame(matrix(1:15, nrow=3))
# linearize the data
l = unlist(d)
# Put whatever filters you like
l[l>10] = NA
# re-assmeble
d = data.frame(matrix(l, nrow = dim(d)[1]))
``````