Question

R script help to take out positive or correlation greater than a threshold

0

Entering edit mode

6.8 years ago

1769mkc ★ 1.2k

I have matrix file which is basically a spearman correlation matrix between genes across various cell type. So now Im trying to find out which set of genes or group of genes whose correlation value is lets say greater than 0.6 if I set that as my threshold. How can I do that? I'm posting a subset of my data. It's a 502 x 502 matrix.

      ACTL6B     ACTR5   ACTR6
ACTL6B  1        0.6        -0.4
ACTR5   0.4        1        -0.3
ACTR6  -0.4      -0.3         1

So I don't want correlation between same set of genes which would be 1. I want other comparison. Like, lets say, ACTL6B and ACTR5 whose correlation is 0.6. I would like to keep those values and genes.

Any help or suggestion would be highly appreciated

R • 6.9k views

ADD COMMENT • link updated 6.8 years ago by cpad0112 21k • written 6.8 years ago by 1769mkc ★ 1.2k

3

Entering edit mode

This is not a bioinformatics question but rather an R programming question. Anyway try something along the line of

correlations[abs(correlations<0.6] <- NA

but next time post on StackOverflow.

ADD REPLY • link 6.8 years ago by Jean-Karim Heriche 27k

1

Entering edit mode

Solution provided above by Jean-Karim Heriche is better and shorter :)

ADD REPLY • link 6.8 years ago by cpad0112 21k

0

Entering edit mode

+1. Sorry, I missed it in the first instance.

ADD REPLY • link 6.8 years ago by Santosh Anand 5.7k

score 4 · Accepted Answer · 2017-08-03

4

Entering edit mode

6.8 years ago

cpad0112 21k

in R:

My data:

   > test
           ACTL6B ACTR5 ACTR6
    ACTL6B    1.0   0.6  -0.4
    ACTR5     0.4   1.0  -0.3
    ACTR6    -0.4  -0.3   1.0

Code :

> as.data.frame(apply(test, 2, function(x) ifelse (abs(x) >=0.6,x,"NA")))
              ACTL6B ACTR5 ACTR6
ACTL6B        1      0.6    NA
ACTR5          NA     1     NA
ACTR6          NA     NA     1

ADD COMMENT • link 6.8 years ago by cpad0112 21k

1

Entering edit mode

Nice idea, though you would need 2 as margin, else the results will be transposed (as you see in your example)

as.data.frame(apply(test, 2, function(x) ifelse (abs(x) >=0.6,x,"NA")))

A simpler solution is to just use sapply

as.data.frame(sapply(test, function(x) ifelse (abs(x) >=0.6,x,"NA")))

ADD REPLY • link 6.8 years ago by Santosh Anand 5.7k

1

Entering edit mode

Thanks. Code updated. sapply for above code, doesn't save/print row names.

ADD REPLY • link 6.8 years ago by cpad0112 21k

0

Entering edit mode

sapply doesn't save/print row names.

It does!

ADD REPLY • link 6.8 years ago by Santosh Anand 5.7k

score 3 · Accepted Answer · 2017-08-03

You could use something like Python for this:

#!/usr/bin/env python

import sys

ctr = 0
genes = []
map = {}
for line in sys.stdin:
    elems = [x for x in line.strip().split()]
    if ctr == 0:
        genes = elems[1:]
        for gene in genes:
            map[gene] = {}
    else:
        paired_gene = elems[0]
        scores = [float(x) for x in elems[1:]]
        for idx, score in enumerate(scores):
            gene = genes[idx]
            map[gene][paired_gene] = score
            map[paired_gene][gene] = score
    ctr += 1
for gene in genes:
    for paired_gene in genes:
        if gene != paired_gene and map[gene][paired_gene] >= 0.6:
            sys.stdout.write("%s\t%s\t%f\n" % (gene, paired_gene, map[gene][paired_gene]))

score 3 · Accepted Answer · 2017-08-03

3

Entering edit mode

6.8 years ago

Santosh Anand 5.7k

d = data.frame(matrix(1:15, nrow=3))
# linearize the data
l = unlist(d) 
# Put whatever filters you like
l[l>10] = NA 
# re-assmeble
d = data.frame(matrix(l, nrow = dim(d)[1]))

ADD COMMENT • link 6.8 years ago by Santosh Anand 5.7k