how to find mismatch pattern from one file to another
2
0
Entering edit mode
6.1 years ago
Kumar ▴ 170

I have two tab delimited files, one contains list of nodes (file 1) and another file contains list of nodes and its additional details. I need to find nodes (file 1), which are not matching in another file (file 2). Please see example below:

File 1:
NODE_35_length_25224_cov_45.741
NODE_42_length_28456_cov_53.6579
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094

File 2:
NODE_35_length_25224_cov_45.741 Prodigal:2.6    CDS 58  777 .   -   0   ID=LGE0470
NODE_42_length_28456_cov_53.6579    Prodigal:2.6    CDS 58  777 .   -   0   ID=LGE0445

OUTPUT:
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094
alignment sequence • 1.2k views
ADD COMMENT
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode

with tsv-utils:

$ tsv-join --e --f file2.txt --k 1 file1.txt 
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094
ADD REPLY
1
Entering edit mode
6.1 years ago
ATpoint 89k
grep -vf <(awk -F "\t" '{print $1}' file2) file1

Try to understand what the function does. The core part is grep -v -f, many tutorial on grep out there.

ADD COMMENT
0
Entering edit mode

This command taking too long time without any output. My files are tab delimited..

ADD REPLY
0
Entering edit mode

My files are tab delimited..

Instead of -F " ", try -F "\t".

ADD REPLY
0
Entering edit mode

Changed delimiter in my comment.

ADD REPLY
0
Entering edit mode

Try to understand why, otherwise it is impossible to debug, especially when commands get more complex.

ADD REPLY
1
Entering edit mode
6.1 years ago

you can use simpler one:

$ cut -f 1 file2.txt  | grep -vf - file1.txt 
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094

or using only awk :

$ awk -F "\t" 'NR==FNR{a[$1];next} !($1 in a)' file2.txt file1.txt
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094
ADD COMMENT

Login before adding your answer.

Traffic: 4195 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6