Question

Making pairwise matrix using ChIP-Seq peak binding matrix in R

0

Entering edit mode

8 months ago

Ankit ▴ 500

Hi everyone,

I have a query about R,

How to convert this

     Protein1 Protein2 Protein3
chr1_1564 1 0 0
chr3_9087 0 1 1
chr4_877671 1 1 0
chr9_90988 0 1 1
chr11_87676 1 0 0
chrX_1546 0 1 1

to this

        Protein1 Protein2 Protein3
Protein1 3 1 0
Protein2 1 4 3
Protein3 0 3 3

using R based functions which can do it quickly?

Further explaination, and relevance to biology and bioinformatics:

On rows are genomic sites and columns are protein names. This is a ChIP-seq data for some of my proteins for which I checked the occupancy on some genomic bins and took the start positon and chr name to give an ID to interval. 0 and 1 shows presence or absence of protein on that genomic sites.

I have applied a for-loop but it is taking a long time for large data. Here is my script as of now which is taking long time for a data of 900000 rows and 500 columns (900000 genomic sites and 500 proteins)

mydf <- data.frame(Protein1= c(1,0,1,0,1,0), Protein2=c(0,1,1,1,0,1), Protein3=c(0,1,0,1,0,1))
mydf <- as.matrix(mydf)
# Create empty matrix to store data
converted_mat <- matrix(0, nrow = ncol(mydf), ncol = ncol(mydf))
rownames(converted_mat)  <- colnames(mydf)
colnames(converted_mat)  <- colnames(mydf)

for (i in 1:ncol(mydf)){
  for  (j in 1:ncol(mydf)){
    converted_mat[i,j] <- sum(ifelse(mydf[,i] == 1 & mydf[,j] == 1, 1,0))
  }
}

Any suggestions?

R matrix binding Chip-seq • 1.2k views

ADD COMMENT • link 8 months ago by Ankit ▴ 500

0

Entering edit mode

How is this related to bioinformatics? If it is related, please add necessary context. If not, please delete this question and consult StackOverflow.

ADD REPLY • link 8 months ago by Ram 43k

0

Entering edit mode

It is a bioinformatics query about binding pattern of proteins across genomic regions.

ADD REPLY • link 8 months ago by Ankit ▴ 500

1

Entering edit mode

If you add more information to your question, we may be able to provide better contextual advice for your actual end goal.

ADD REPLY • link 8 months ago by jared.andrews07 ★ 16k

0

Entering edit mode

Please edit your post and add as much biological context as it takes for your post to make sense. If not, the post will be removed as off-topic.

ADD REPLY • link 8 months ago by Ram 43k

0

Entering edit mode

The detailed explanation has been added to the post along with biological context. If I am still missing things let me know,

ADD REPLY • link 8 months ago by Ankit ▴ 500

score 4 · Answer 1 · 2023-09-01

Use matrix multiplication:

input_matrix <- matrix(c(1, 0, 0,
                         0, 1, 1,
                         1, 1, 0,
                         0, 1, 1,
                         1, 0, 0,
                         0, 1, 1), nrow = 6, byrow = TRUE)

colnames(input_matrix) <- c("a", "b", "c")

# Compute the co-occurrence matrix
cooccurrence_matrix <- t(input_matrix) %*% input_matrix

# Assign column and row names
colnames(cooccurrence_matrix) <- c("a", "b", "c")
rownames(cooccurrence_matrix) <- c("a", "b", "c")

print(cooccurrence_matrix)
  a b c
a 3 1 0
b 1 4 3
c 0 3 3

Didn't benchmark, but should be much faster.

score 1 · Answer 2 · 2023-09-01

1

Entering edit mode

8 months ago

bk11 ★ 2.4k

How about using crossprod function. I have not tested how fast it will be though.

mydf <- data.frame(a= c(1,0,1,0,1,0), b=c(0,1,1,1,0,1), c=c(0,1,0,1,0,1))
mydf <- as.matrix(mydf)
out <- crossprod(mydf)
out
a b c
a 3 1 0
b 1 4 3
c 0 3 3

ADD COMMENT • link 8 months ago by bk11 ★ 2.4k

0

Entering edit mode

Both are good suggestions as they are much faster. Thank you.

ADD REPLY • link 8 months ago by Ankit ▴ 500

0

Entering edit mode

^ This is the downside to answering questions where OP has not invested sufficient effort - they take the answer and run and in all probability will come back with a different XY problem where X remains the same.

Apologies to both bk11 and jared.andrews07, but I'll delete this entire post unless OP adds context.

ADD REPLY • link 8 months ago by Ram 43k

0

Entering edit mode

The detailed explanation has been added to the post along with biological context. If I am still missing things let me know,

ADD REPLY • link 8 months ago by Ankit ▴ 500