Question: find unpaired files
0
gravatar for arraychip
9 months ago by
arraychip30
arraychip30 wrote:

In a folder, there are txt files, all with unique names. After an analysis, new files are generated, in the form of "file name_analyzed.txt". One "_analyzed.txt" file for an input file. Some files, for some unknown reason, don't generate "_analyzed.txt".

So, it looks like:

file1.txt file1_analyzed.txt /
file2.txt file2_analyzed.txt /
**file3.txt** /
file4.txt file4_analyzed.txt /
..
fileN.txt fileN_analyzed.txt

.

How can I list all the files like "file3", with no accompanying pair? Typically, there are over 40,000 files in a folder. Any command-lines to solve this problem?? Thanks

sequence • 253 views
ADD COMMENTlink modified 9 months ago by Joe16k • written 9 months ago by arraychip30
2
gravatar for genomax
9 months ago by
genomax83k
United States
genomax83k wrote:

One way.

$ ls -1
file1.txt
file1_analyzed.txt
file2.txt
file3.txt
file3_analyzed.txt
file4_analyzed.txt

# following is needed for bash 
$ shopt -s extglob 

$ comm -3 <(ls -1 *_analyzed.txt | sed 's/_analyzed.txt//' | sort) <(ls -1 !(*analyzed.txt) | sed 's/.txt//'| sort)
    file2
file4

In output column 1: file4_analyzed.txt has no corresponding plain file
In output column 2: file2 has no corresponding _analyzed.txt file

If you only have missing _analyzed.txt files then you should only get one column of output.

ADD COMMENTlink modified 9 months ago • written 9 months ago by genomax83k

It worked perfectly. Greatly appreciate it.

ADD REPLYlink written 9 months ago by arraychip30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1140 users visited in the last hour