Question: How to check the pattern in a specific file present in which other files in a directory?
0
gravatar for newbie
11 days ago by
newbie70
newbie70 wrote:

I have a directory named Analysis. Inside this directory I have some files like below:

Analysis
  |_____file1.csv
  |_____file2.csv
  |_____file3.csv
  |_____ReqGenes.csv

file1, file2, file3 have following information.

file1.csv looks like below:

LINC01419
AAR2
AC008560.1
ACTRT3
AKAP17A
AL139353.1
ARG2
ATE1
BORA

file2.csv looks like below:

DUSP28
EID2B
ELOVL6
FAM118B
FAM200A
FDXACB1
FKBP1B
FRAT1
FSD1L

file3.csv looks like below:

KDM4D
KLF12
KLLN
LRRC55
LRRIQ3
MBTPS2
MORN2
MRPS17
MRPS6
MTX3

I usually check whether a specific pattern LINC01419 exists in any of the files in the directory like below:

grep -E "LINC01419" *.csv

The output is like below:

file1.csv: LINC01419

But instead of searching for each gene, I have a file named ReqGenes.csv looks like below with all the required genes. So, with one command I would like to know in which files the Genes are present.

Genes
LINC01419
MORN2
MTX3
FSD1L
FAM118B
EID2B
ARG2
KLLN
MRPS6
ATE1

The output I need should be like below:

file1.csv: LINC01419, ARG2, ATE1
file2.csv: FSD1L, FAM118B, EID2B
file3.csv: MORN2, KLLN, MRPS6, MTX3
xargs linux grep find • 115 views
ADD COMMENTlink modified 10 days ago by Pierre Lindenbaum128k • written 11 days ago by newbie70
1

Try using grep -f

More examples on: https://www.linuxtechi.com/linux-grep-command-with-14-different-examples/

ADD REPLYlink written 11 days ago by Sej Modha4.7k

thanks. yes I tried like below:

grep -f ReqGenes.csv file*.csv

And the output is like below:

file1.csv: ATE1

Dont know why it gave output for only last gene in ReqGenes.csv. Why it didn't give output for other genes?

ADD REPLYlink written 10 days ago by newbie70
3
gravatar for gtasource
10 days ago by
gtasource60
gtasource60 wrote:

If you don't mind using R, this will do the trick. Install the packages first:

install.packages("tidyverse")
install.packages("magrittr")

And then run the code:

library(tidyverse)
library(magrittr)

genes <- list.files(pattern = "file\\d*.csv")
genes.read <- lapply(genes,function(x) read.delim(x, header = FALSE))
genes.read <- lapply(genes.read, function(x) set_colnames(x, "Genes"))
ref <- list.files(pattern = "Req")
ref.read <- read.delim(ref)
intersect <- lapply(seq_along(genes.read), function(x) 
  intersect(genes.read[[x]], ref.read))
for(i in 1:length(genes.read)) { 
  cat(gene[[i]],":",intersect[[i]]$Genes, "\n")
}

Output

file1.csv : LINC01419 ARG2 ATE1 
file2.csv : EID2B FAM118B FSD1L 
file3.csv : KLLN MORN2 MRPS6 MTX3
ADD COMMENTlink modified 10 days ago • written 10 days ago by gtasource60

thanks for the reply. small correction in your code.

for(i in 1:length(genes.read)) { 
  cat(genes[[i]],":",intersect[[i]]$gene_name, "\n")
}
ADD REPLYlink written 10 days ago by newbie70

Thanks for the catch!! It should still be

intersect[[i]]$Genes

Because that's the name of the column in the intersect data frame.

ADD REPLYlink modified 10 days ago • written 10 days ago by gtasource60

yes. In my original file it is gene_name. Here it is Genes. My mistake.

ADD REPLYlink written 10 days ago by newbie70
0
gravatar for Pierre Lindenbaum
10 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:
for F in file*.csv; do echo -n "${F}:" && grep -F -f ReqGenes.csv -w "${F}" | tr "\n" "," ; echo ; done
ADD COMMENTlink modified 10 days ago • written 10 days ago by Pierre Lindenbaum128k

thanks but this way I could see output like below:

file1.csv:
file2.csv:
file3.csv:

do I need to remove echo?

ADD REPLYlink written 10 days ago by newbie70

Play around with the code and see if you can come up with a solution yourself. We aren't here to hand-spoon solutions (and to be fair, the codes that have been provided are more than enough for you to figure the rest out.)

ADD REPLYlink written 10 days ago by gtasource60

thanks for the suggestion. I'm not a programmer yet. I'm still learning and couldn't do it and mainly I'm very new to linux. So, not aware about it much.

ADD REPLYlink written 10 days ago by newbie70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1738 users visited in the last hour