R script help to take out positive or correlation greater than a threshold
4
0
Entering edit mode
6.8 years ago
1769mkc ★ 1.2k

I have matrix file which is basically a spearman correlation matrix between genes across various cell type. So now Im trying to find out which set of genes or group of genes whose correlation value is lets say greater than 0.6 if I set that as my threshold. How can I do that? I'm posting a subset of my data. It's a 502 x 502 matrix.

      ACTL6B     ACTR5   ACTR6
ACTL6B  1        0.6        -0.4
ACTR5   0.4        1        -0.3
ACTR6  -0.4      -0.3         1

So I don't want correlation between same set of genes which would be 1. I want other comparison. Like, lets say, ACTL6B and ACTR5 whose correlation is 0.6. I would like to keep those values and genes.

Any help or suggestion would be highly appreciated

R • 6.9k views
ADD COMMENT
3
Entering edit mode

This is not a bioinformatics question but rather an R programming question. Anyway try something along the line of

correlations[abs(correlations<0.6] <- NA

but next time post on StackOverflow.

ADD REPLY
1
Entering edit mode

Solution provided above by Jean-Karim Heriche is better and shorter :)

ADD REPLY
0
Entering edit mode

+1. Sorry, I missed it in the first instance.

ADD REPLY
4
Entering edit mode
6.8 years ago

in R:

My data:

   > test
           ACTL6B ACTR5 ACTR6
    ACTL6B    1.0   0.6  -0.4
    ACTR5     0.4   1.0  -0.3
    ACTR6    -0.4  -0.3   1.0

Code :

> as.data.frame(apply(test, 2, function(x) ifelse (abs(x) >=0.6,x,"NA")))
              ACTL6B ACTR5 ACTR6
ACTL6B        1      0.6    NA
ACTR5          NA     1     NA
ACTR6          NA     NA     1
ADD COMMENT
1
Entering edit mode

Nice idea, though you would need 2 as margin, else the results will be transposed (as you see in your example)

as.data.frame(apply(test, 2, function(x) ifelse (abs(x) >=0.6,x,"NA")))

A simpler solution is to just use sapply

as.data.frame(sapply(test, function(x) ifelse (abs(x) >=0.6,x,"NA")))
ADD REPLY
1
Entering edit mode

Thanks. Code updated. sapply for above code, doesn't save/print row names.

ADD REPLY
0
Entering edit mode

sapply doesn't save/print row names.

It does!

ADD REPLY
3
Entering edit mode
6.8 years ago

You could use something like Python for this:

#!/usr/bin/env python

import sys

ctr = 0
genes = []
map = {}
for line in sys.stdin:
    elems = [x for x in line.strip().split()]
    if ctr == 0:
        genes = elems[1:]
        for gene in genes:
            map[gene] = {}
    else:
        paired_gene = elems[0]
        scores = [float(x) for x in elems[1:]]
        for idx, score in enumerate(scores):
            gene = genes[idx]
            map[gene][paired_gene] = score
            map[paired_gene][gene] = score
    ctr += 1
for gene in genes:
    for paired_gene in genes:
        if gene != paired_gene and map[gene][paired_gene] >= 0.6:
            sys.stdout.write("%s\t%s\t%f\n" % (gene, paired_gene, map[gene][paired_gene]))
ADD COMMENT
0
Entering edit mode

I did figure it out in R , but this looks better i will try your python code..

ADD REPLY
3
Entering edit mode
6.8 years ago
d = data.frame(matrix(1:15, nrow=3))
# linearize the data
l = unlist(d) 
# Put whatever filters you like
l[l>10] = NA 
# re-assmeble
d = data.frame(matrix(l, nrow = dim(d)[1]))
ADD COMMENT

Login before adding your answer.

Traffic: 2650 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6