Making pairwise matrix using ChIP-Seq peak binding matrix in R
2
0
Entering edit mode
8 months ago
Ankit ▴ 500

Hi everyone,

I have a query about R,

How to convert this

     Protein1 Protein2 Protein3
chr1_1564 1 0 0
chr3_9087 0 1 1
chr4_877671 1 1 0
chr9_90988 0 1 1
chr11_87676 1 0 0
chrX_1546 0 1 1

to this

        Protein1 Protein2 Protein3
Protein1 3 1 0
Protein2 1 4 3
Protein3 0 3 3

using R based functions which can do it quickly?

Further explaination, and relevance to biology and bioinformatics:

On rows are genomic sites and columns are protein names. This is a ChIP-seq data for some of my proteins for which I checked the occupancy on some genomic bins and took the start positon and chr name to give an ID to interval. 0 and 1 shows presence or absence of protein on that genomic sites.

I have applied a for-loop but it is taking a long time for large data. Here is my script as of now which is taking long time for a data of 900000 rows and 500 columns (900000 genomic sites and 500 proteins)

mydf <- data.frame(Protein1= c(1,0,1,0,1,0), Protein2=c(0,1,1,1,0,1), Protein3=c(0,1,0,1,0,1))
mydf <- as.matrix(mydf)
# Create empty matrix to store data
converted_mat <- matrix(0, nrow = ncol(mydf), ncol = ncol(mydf))
rownames(converted_mat)  <- colnames(mydf)
colnames(converted_mat)  <- colnames(mydf)

for (i in 1:ncol(mydf)){
  for  (j in 1:ncol(mydf)){
    converted_mat[i,j] <- sum(ifelse(mydf[,i] == 1 & mydf[,j] == 1, 1,0))
  }
}

Any suggestions?

R matrix binding Chip-seq • 1.2k views
ADD COMMENT
0
Entering edit mode

How is this related to bioinformatics? If it is related, please add necessary context. If not, please delete this question and consult StackOverflow.

ADD REPLY
0
Entering edit mode

It is a bioinformatics query about binding pattern of proteins across genomic regions.

ADD REPLY
1
Entering edit mode

If you add more information to your question, we may be able to provide better contextual advice for your actual end goal.

ADD REPLY
0
Entering edit mode

Please edit your post and add as much biological context as it takes for your post to make sense. If not, the post will be removed as off-topic.

ADD REPLY
0
Entering edit mode

The detailed explanation has been added to the post along with biological context. If I am still missing things let me know,

ADD REPLY
4
Entering edit mode
8 months ago

Use matrix multiplication:

input_matrix <- matrix(c(1, 0, 0,
                         0, 1, 1,
                         1, 1, 0,
                         0, 1, 1,
                         1, 0, 0,
                         0, 1, 1), nrow = 6, byrow = TRUE)

colnames(input_matrix) <- c("a", "b", "c")

# Compute the co-occurrence matrix
cooccurrence_matrix <- t(input_matrix) %*% input_matrix

# Assign column and row names
colnames(cooccurrence_matrix) <- c("a", "b", "c")
rownames(cooccurrence_matrix) <- c("a", "b", "c")

print(cooccurrence_matrix)
  a b c
a 3 1 0
b 1 4 3
c 0 3 3

Didn't benchmark, but should be much faster.

ADD COMMENT
1
Entering edit mode
8 months ago
bk11 ★ 2.4k

How about using crossprod function. I have not tested how fast it will be though.

mydf <- data.frame(a= c(1,0,1,0,1,0), b=c(0,1,1,1,0,1), c=c(0,1,0,1,0,1))
mydf <- as.matrix(mydf)
out <- crossprod(mydf)
out
a b c
a 3 1 0
b 1 4 3
c 0 3 3
ADD COMMENT
0
Entering edit mode

Both are good suggestions as they are much faster. Thank you.

ADD REPLY
0
Entering edit mode

^ This is the downside to answering questions where OP has not invested sufficient effort - they take the answer and run and in all probability will come back with a different XY problem where X remains the same.

Apologies to both bk11 and jared.andrews07, but I'll delete this entire post unless OP adds context.

ADD REPLY
0
Entering edit mode

The detailed explanation has been added to the post along with biological context. If I am still missing things let me know,

ADD REPLY

Login before adding your answer.

Traffic: 1965 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6