Question

command for common between three files

0

Entering edit mode

2.6 years ago

harry ▴ 30

I have three text files and I want to know the difference between the 3 files and the common between 3 files. It looks like this:

1st file:

hsa_circ_0072810
hsa_circ_0072811
hsa_circ_0072813
hsa_circ_0098750
hsa_circ_0125807
hsa_circ_0000295
hsa_circ_0134603
hsa_circ_0001196
hsa_circ_0097585
hsa_circ_0097586
hsa_circ_0006118
hsa_circ_0080950
hsa_circ_0102355
hsa_circ_0000175
hsa_circ_0000934
hsa_circ_0125807

2nd file:

hsa_circ_0072810
hsa_circ_0072811
hsa_circ_0072813
hsa_circ_0098750
hsa_circ_0017672
hsa_circ_0040452
hsa_circ_0098687
hsa_circ_0000400
hsa_circ_0004055
hsa_circ_0006620
hsa_circ_0006118
hsa_circ_0080950
hsa_circ_0102355
hsa_circ_0000175
hsa_circ_0000934
hsa_circ_0125807

3rd file:

hsa_circ_0072810
hsa_circ_0072811
hsa_circ_0072813
hsa_circ_0098750
hsa_circ_0110890
hsa_circ_0001611
hsa_circ_0001675
hsa_circ_0002937
hsa_circ_0004932
hsa_circ_0002393
hsa_circ_0116839
hsa_circ_0072850
hsa_circ_0072848
hsa_circ_0131605
hsa_circ_0001826
hsa_circ_0080696

So can you please tell me how can I extract the common between 3 files and the difference between 3 files?

Thanks in advance

grep • 1.3k views

ADD COMMENT • link updated 2.6 years ago by cpad0112 21k • written 2.6 years ago by harry ▴ 30

score 3 · Answer 1 · 2021-09-03

All combinations among 3 : http://www.interactivenn.net/. Output image and data can be downloaded.

venn

in R, for all combination of comparisons:

library(gplots)
list=lapply(list.files(pattern = "*.txt", full.names = T ), function(x) read.csv (x,header = F))
names(list)=list.files(pattern = "*.txt")
print(venn(list))

ps: https://bioinformatics.psb.ugent.be/webtools/Venn/ -- expired link and not working.

score 2 · Answer 2 · 2021-09-03

2

Entering edit mode

2.6 years ago

Mensur Dlakic ★ 27k

grep -w -f 1st 2nd | grep -w -f - 3rd > common

You will have to define better what you mean by the difference between 3 files.

ADD COMMENT • link 2.6 years ago by Mensur Dlakic ★ 27k

2

Entering edit mode

or in R:

Reduce(
  intersect, list(vect1, vect2, vect3)
)

ADD REPLY • link 2.6 years ago by ponganta ▴ 590

2

Entering edit mode

grep -v -f 2nd 1st | grep -v -f 3rd - > 1st_unique
grep -v -f 1st 2nd | grep -v -f 3rd - > 2nd_unique
grep -v -f 1st 3rd | grep -v -f 2nd - > 3rd_unique

ADD REPLY • link 2.6 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

Thanks for replying. Difference between means those are not present in any other text file they are unique in one particular file.

ADD REPLY • link 2.6 years ago by harry ▴ 30

score 1 · Answer 3 · 2021-09-03

1

Entering edit mode

2.6 years ago

ponganta ▴ 590

In R, you could use this function to obtain the unique values for each file:


#' Find distinct entries per list element
#' 
#' @param ls A `list` of vectors of the same data type
#' @value A `list` of unique elements of each vector
elements_distinct = function(ls){

  lapply(c(1:length(ls)), function(x,y)

    y[[x]][which(!y[[x]] %in% unique(unlist(y[-x])))],

    y = ls

  )

}

The advantage of this approach would be scalability. It doesn't matter whether you have 3, 9, or 9000 sets to compare.

ADD COMMENT • link 2.6 years ago by ponganta ▴ 590

0

Entering edit mode

Followup: Example usage (in an R-project, which I would highly recommend for any type of data munging!). In this project, get your files into a folder called "data".


# PACKAGES ----------------------
library(magrittr) # for the pipe
library(tools)    # for file_path_sans_ext

# FILES -------------------------
## list all filepaths
myfiles = list.files("data", full.names = TRUE) 

## get filenames (no path, no extension)
mynames = list.files("data")  %>% 
  file_path_sans_ext()

# ANALYSES -----------------------
## load your files into a list. Each list element will be named after the file.
mylist = lapply(myfiles, readLines) %>% setNames(mynames)

## find distinct elements
mysnowflakes = elements_distinct(mylist)

ADD REPLY • link 2.6 years ago by ponganta ▴ 590