Merge multiple list of IDs and summarize with a matrix
1
0
Entering edit mode
4.9 years ago
lessismore ★ 1.3k

Dear all,

i have 30 different lists of gene IDs partially overlapping and i want to create a matrix where i have on the rownames all of them and the columns should correspond to the lists and indicate if the gene ID was present or not there (1 or 0).

here the inputs: several .txt files

list1.txt

GO:0004743  
GO:0004497  
GO:0004222  
GO:0003723  
GO:0003690

list2.txt

GO:0008757  
GO:0008289  
GO:0005509  
GO:0005488

list3.txt

GO:0005509  
GO:0005488  
GO:0004222  
GO:0003723

list4.txt

GO:0004497

list5.txt

GO:0003723  
GO:0003690

 

what i want :

GO-ID   list1   list2   list3   list4   list5
    GO:0008757  0   1   0   0   0
    GO:0008289  0   1   0   0   0
    GO:0005509  0   1   1   0   0
    GO:0005488  0   1   1   0   0
    GO:0004743  1   0   0   0   0
    GO:0004497  1   0   0   1   0
    GO:0004222  1   0   1   0   0
    GO:0003723  1   0   1   0   1
    GO:0003690  1   0   0   0   1

Sorry for the trivial question, i would appreciate any help.

R • 643 views
ADD COMMENT
1
Entering edit mode
4.9 years ago
Asaf 10k

You can start by making a tibble like this:

GO-ID             list     value
GO:0004497   list1   1
GO:0004497   list2
GO:0003690   list1
...

And then spread(tbl, list, value, fill=0)

I don't know how your input looks like but should be easy with a loop around split()

ADD COMMENT
0
Entering edit mode

Hi, i updated my inputs. Can you please be more clear on how to make this? Thanks!

ADD REPLY
1
Entering edit mode

Not tested but something like:

dtbl <- tibble()
for (f in Sys.glob("list*.txt")){
  gos <- read_delim(f, "\t", col_names=c("GO"))
  gos$list <- f
  gos$value <- 1
  dtbl <- rbind(dtbl, gos)
}
dtbl2 <- spread(dtbl, list, value, fill=0)
ADD REPLY

Login before adding your answer.

Traffic: 2310 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6