Question: find unpaired files
0
gravatar for arraychip
14 months ago by
arraychip30
arraychip30 wrote:

In a folder, there are txt files, all with unique names. After an analysis, new files are generated, in the form of "file name_analyzed.txt". One "_analyzed.txt" file for an input file. Some files, for some unknown reason, don't generate "_analyzed.txt".

So, it looks like:

file1.txt file1_analyzed.txt /
file2.txt file2_analyzed.txt /
**file3.txt** /
file4.txt file4_analyzed.txt /
..
fileN.txt fileN_analyzed.txt

.

How can I list all the files like "file3", with no accompanying pair? Typically, there are over 40,000 files in a folder. Any command-lines to solve this problem?? Thanks

sequence • 299 views
ADD COMMENTlink modified 14 months ago by Joe18k • written 14 months ago by arraychip30
2
gravatar for genomax
14 months ago by
genomax91k
United States
genomax91k wrote:

One way.

$ ls -1
file1.txt
file1_analyzed.txt
file2.txt
file3.txt
file3_analyzed.txt
file4_analyzed.txt

# following is needed for bash 
$ shopt -s extglob 

$ comm -3 <(ls -1 *_analyzed.txt | sed 's/_analyzed.txt//' | sort) <(ls -1 !(*analyzed.txt) | sed 's/.txt//'| sort)
    file2
file4

In output column 1: file4_analyzed.txt has no corresponding plain file
In output column 2: file2 has no corresponding _analyzed.txt file

If you only have missing _analyzed.txt files then you should only get one column of output.

ADD COMMENTlink modified 14 months ago • written 14 months ago by genomax91k

It worked perfectly. Greatly appreciate it.

ADD REPLYlink written 14 months ago by arraychip30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1234 users visited in the last hour