Question

How to check the pattern in a specific file present in which other files in a directory?

0

Entering edit mode

3.9 years ago

newbie ▴ 120

I have a directory named Analysis. Inside this directory I have some files like below:

Analysis
  |_____file1.csv
  |_____file2.csv
  |_____file3.csv
  |_____ReqGenes.csv

file1, file2, file3 have following information.

file1.csv looks like below:

LINC01419
AAR2
AC008560.1
ACTRT3
AKAP17A
AL139353.1
ARG2
ATE1
BORA

file2.csv looks like below:

DUSP28
EID2B
ELOVL6
FAM118B
FAM200A
FDXACB1
FKBP1B
FRAT1
FSD1L

file3.csv looks like below:

KDM4D
KLF12
KLLN
LRRC55
LRRIQ3
MBTPS2
MORN2
MRPS17
MRPS6
MTX3

I usually check whether a specific pattern LINC01419 exists in any of the files in the directory like below:

grep -E "LINC01419" *.csv

The output is like below:

file1.csv: LINC01419

But instead of searching for each gene, I have a file named ReqGenes.csv looks like below with all the required genes. So, with one command I would like to know in which files the Genes are present.

Genes
LINC01419
MORN2
MTX3
FSD1L
FAM118B
EID2B
ARG2
KLLN
MRPS6
ATE1

The output I need should be like below:

file1.csv: LINC01419, ARG2, ATE1
file2.csv: FSD1L, FAM118B, EID2B
file3.csv: MORN2, KLLN, MRPS6, MTX3

linux grep find xargs • 1.3k views

ADD COMMENT • link updated 3.9 years ago by Pierre Lindenbaum 161k • written 3.9 years ago by newbie ▴ 120

1

Entering edit mode

Try using grep -f

More examples on: https://www.linuxtechi.com/linux-grep-command-with-14-different-examples/

ADD REPLY • link 3.9 years ago by Sej Modha 5.3k

0

Entering edit mode

thanks. yes I tried like below:

grep -f ReqGenes.csv file*.csv

And the output is like below:

file1.csv: ATE1

Dont know why it gave output for only last gene in ReqGenes.csv. Why it didn't give output for other genes?

ADD REPLY • link 3.9 years ago by newbie ▴ 120

0

Entering edit mode

3.9 years ago

Pierre Lindenbaum 161k

for F in file*.csv; do echo -n "${F}:" && grep -F -f ReqGenes.csv -w "${F}" | tr "\n" "," ; echo ; done

ADD COMMENT • link 3.9 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

thanks but this way I could see output like below:

file1.csv:
file2.csv:
file3.csv:

do I need to remove echo?

ADD REPLY • link 3.9 years ago by newbie ▴ 120

0

Entering edit mode

Play around with the code and see if you can come up with a solution yourself. We aren't here to hand-spoon solutions (and to be fair, the codes that have been provided are more than enough for you to figure the rest out.)

ADD REPLY • link 3.9 years ago by gtasource ▴ 60

0

Entering edit mode

thanks for the suggestion. I'm not a programmer yet. I'm still learning and couldn't do it and mainly I'm very new to linux. So, not aware about it much.

ADD REPLY • link 3.9 years ago by newbie ▴ 120

score 3 · Accepted Answer · 2020-05-14

3

Entering edit mode

3.9 years ago

gtasource ▴ 60

If you don't mind using R, this will do the trick. Install the packages first:

install.packages("tidyverse")
install.packages("magrittr")

And then run the code:

library(tidyverse)
library(magrittr)

genes <- list.files(pattern = "file\\d*.csv")
genes.read <- lapply(genes,function(x) read.delim(x, header = FALSE))
genes.read <- lapply(genes.read, function(x) set_colnames(x, "Genes"))
ref <- list.files(pattern = "Req")
ref.read <- read.delim(ref)
intersect <- lapply(seq_along(genes.read), function(x) 
  intersect(genes.read[[x]], ref.read))
for(i in 1:length(genes.read)) { 
  cat(gene[[i]],":",intersect[[i]]$Genes, "\n")
}

Output

file1.csv : LINC01419 ARG2 ATE1 
file2.csv : EID2B FAM118B FSD1L 
file3.csv : KLLN MORN2 MRPS6 MTX3

ADD COMMENT • link 3.9 years ago by gtasource ▴ 60

0

Entering edit mode

thanks for the reply. small correction in your code.

for(i in 1:length(genes.read)) { 
  cat(genes[[i]],":",intersect[[i]]$gene_name, "\n")
}

ADD REPLY • link 3.9 years ago by newbie ▴ 120

0

Entering edit mode

Thanks for the catch!! It should still be

intersect[[i]]$Genes

Because that's the name of the column in the intersect data frame.

ADD REPLY • link 3.9 years ago by gtasource ▴ 60

0

Entering edit mode

yes. In my original file it is gene_name. Here it is Genes. My mistake.

ADD REPLY • link 3.9 years ago by newbie ▴ 120