A function for having a multiway intersection
1
0
Entering edit mode
5.8 years ago

Hi,

I have time points , I have differentially expressed genes for each possible combination of these time points too. So that etc. How I can have a code to extract common genes from each possible pairwise combinations of these lists?

R Venn diagram intersection • 1.8k views
ADD COMMENT
0
Entering edit mode

Hello jivarajivaraj!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/4533

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode
5.8 years ago

In R, just use the intersect() function as in

intersect(2h, c(4h, 6h, ...))

where xh is replaced by the corresponding gene set.

Edit: I realized that you may actually mean 2h vs 4h then 2h vs 6h and so on, in which case, just iterate intersect() over all relevant combinations.

ADD COMMENT
0
Entering edit mode

Thank you. The number of genes in each of these lists are not equal. For example there are 630 common genes between h2 and h4 but i have 1143 genes in h2 list and 768 genes in h4 list , so h2 vs h4 would be 630/1143 and h4 vs h2 would be 630/768. That would would be great if i have a matrix or plot of all possible pairwise combinations of my 8 time points.

intersect(2h, c(4h, 6h, ...))

list()
ADD REPLY
3
Entering edit mode

What you're looking for is essentially a matrix of similarity between sets. A typical measure is the Jaccard index which can be easily computed like this:

jaccard_index <- function(x,y) {
    intersect <- length(intersect(x,y))
    similarity <- intersect / (length(x) + length(y) - intersect)
    return(similarity)
}

You can easily compute the similarity matrix with for loops:

sets <- list(2h, 4h, 6h, 8h, 10h, 12h, 14h, 16h)
S <- matrix(NA, nrow = length(sets), ncol = length(sets)
for(i in 1: length(sets)) {
    for(j in i:length(sets) { # matrix is symmetric so only compute the top triangular part
        S[i,j] <- jaccard_index(sets[i],sets[j])
        S[j,i] <- S[i,j]
    }
}
ADD REPLY
0
Entering edit mode

sorry says that

Error: object 'i' not found

I have a vector of characters for each time point putting them in list and i run your function

> str(sets)
List of 9

>
ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2404 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6