How to find those files name which contains specific symbol i.e. underscore sign in column2?
2
1
Entering edit mode
3.1 years ago
sidrarafi89 ▴ 10

i have 5000 files (name as: tni00001.keg, eco00001.keg etc) which contains underscre sign in 2nd column inside the many files (W909_00110). after the underscore sign, the number (i.e.00110) is actually represent the enzyme ID but some file out of 5000 donot contain this underscore sign in between the IDs.so now i want to extract those files names (like abc00001.keg) which don’t contains underscore sign _ in 2nd column of each files.

Example: keg file look like this from inside

D      W909_00110 glk; glucokinase      K00845 glk; glucokinase [EC:2.7.1.2]

D      W909_17905 pgi; glucose-6-phosphate isomerase    K01810 GPI; glucose-6-phosphate isomerase [EC:5.3.1.9]

D      W909_19315 6-phosphofructokinase K00850 pfkA; 6-phosphofructokinase 1 [EC:2.7.1.11]
awk grep R find • 880 views
ADD COMMENT
0
Entering edit mode

i want to extract those files names (like abc00001.keg) which don’t contains underscore sign _ in 2nd column of each files.

Is absence of _ consistent for all lines in these files or only some records may not have _number?

ADD REPLY
0
Entering edit mode

total 5000 files and some files not have this underscore sign in between the IDs. how many files that donot conatins this sign inside, that's what i want to know and extract all these files names.

ADD REPLY
0
Entering edit mode

Clarification I was asking for is do all records in that file of interest not have a _ or only some.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Are these fields all tab separated?

ADD REPLY
0
Entering edit mode

may I ask what the ultimate goal is? I.e. why are you specifically interested in files where the underscore is missing?

ADD REPLY
2
Entering edit mode
3.1 years ago
bruce.moran ▴ 910

so now i want to extract those files names (like abc00001.keg) which don’t contains underscore sign _ in 2nd column of each files.

for x in *.keg; do
  TEST=$(cut -f 2 $x | grep _ | head -n1)
  if [[ $TEST == "" ]];then
    echo $x;
  fi
done
ADD COMMENT
0
Entering edit mode
3.1 years ago
zx8754 11k

Something like this (not tested):

sapply(list.files("path/to/files", "*.keg", full.names = TRUE), 
       function(i){
         d <- read.table(i)
         if(!all(grepl("_", d[, 2]))){ 
           i 
           # maybe use basename(i) to return just the filename without path
         } else { NULL }
       })
ADD COMMENT
0
Entering edit mode

it give me the following error ./script.sh: line 1: syntax error near unexpected token list.files' ./script.sh: line 1:sapply(list.files("/home/reaz/Documents/dump/bacteria/keggdatabase-uniq", "*.keg", full.names = TRUE), '

ADD REPLY
0
Entering edit mode

The answer by @zx8754 is in R you will need to save it as a .R file and then execute it.

ADD REPLY
0
Entering edit mode

it still show me an error: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 2 did not have 3 elements Calls: sapply -> lapply -> FUN -> read.table -> scan Execution halted

ADD REPLY
0
Entering edit mode

You need to break it down, run the example code row by row, and see where it is failing. My guess is list.files is not finding any files.

ADD REPLY

Login before adding your answer.

Traffic: 1437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6