Making sublist of 26 gene lists
6.5 years ago
zizigolu ★ 4.3k

Hi, I have 26 lists of genes. I want to extract an overlap of 26 lists for clustering. something like below

> head(inc[,1:4])
31439_f_at 31440_at 31441_at 31442_at
set1          1        1        1        1
set2          0        0        0        0
set3          0        0        0        0
>


I tried ‘GSEABase’, but too complicated. May someone help me please?

Ideally, a matrix with "gene lists" as the columns (ie., gene list 1 in column 1, gene list 2 in column 2, etc.) and rows with the union of all genes. Put a "1" in each cell for a gene that is present in a gene list and "0" elsewhere.

6.5 years ago
russhh 5.7k

A related question was asked the other day.

gene_lists = list(letters[1:3], letters[3:7], letters[6:8])


There's loads of ways to do this

The solution that @lessismore generated in the comments was effectively:

make_bipartite_adjacency_from_sets <- function(list_of_sets){
universe <- sort(unique(unlist(list_of_sets)))
adjacency_df <- lapply(list_of_sets, function(x) as.numeric(universe %in% x)) %>% as.data.frame()
}

G1 G2 G3
a  1  0  0
b  1  0  0
c  1  1  0
d  0  1  0
e  0  1  0
f  0  1  1
g  0  1  1
h  0  0  1


You could also do a tidyverse version (but this disallows row names):

make_bipartite_adjacency_from_sets2 <- function(list_of_sets){
list_of_sets %>%
purrr::map(function(x) tibble::data_frame(gene_id = x, adj = 1)) %>%
dplyr::bind_rows(.id = "set_id") %>%
}

# A tibble: 8 x 4
gene_id    G1    G2    G3
*   <chr> <dbl> <dbl> <dbl>
1       a     1     0     0
2       b     1     0     0
3       c     1     1     0
4       d     0     1     0
5       e     0     1     0
6       f     0     1     1
7       g     0     1     1
8       h     0     0     1

Thank you, the problem is I can't figure out how make a gene list. For instance, how make gene list or gene set1 by a column of genes?

I made GS1 like this screenshot

https://ibb.co/khxeKn

but says

> make_bipartite_adjacency_from_sets(gene_lists)
Error in attributes(.Data) <- c(attributes(.Data), attrib) :
'names' attribute [1704] must be the same length as the vector [2]
Called from: structure(res, levels = lv, names = nm, class = "factor")
Browse[1]>

can't you just put all your GS* vectors into a named list?

Thanks a lot, without your help definitely I could not figure out for at least 2 weeks...

I did so;

    library(igraph)

GS1=c(t(GS1))

GS2=c(t(GS2))

gene_lists = list(c(GS1), c(GS2))

universe <- sort(unique(unlist(list_of_sets)))
adjacency_df <- lapply(list_of_sets, function(x) as.numeric(universe %in% x)) %>% as.data.frame()
}