Question: how to find mismatch pattern from one file to another
0
gravatar for Manoj
16 days ago by
Manoj30
Canada
Manoj30 wrote:

I have two tab delimited files, one contains list of nodes (file 1) and another file contains list of nodes and its additional details. I need to find nodes (file 1), which are not matching in another file (file 2). Please see example below:

File 1:
NODE_35_length_25224_cov_45.741
NODE_42_length_28456_cov_53.6579
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094

File 2:
NODE_35_length_25224_cov_45.741 Prodigal:2.6    CDS 58  777 .   -   0   ID=LGE0470
NODE_42_length_28456_cov_53.6579    Prodigal:2.6    CDS 58  777 .   -   0   ID=LGE0445

OUTPUT:
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094
sequence alignment • 105 views
ADD COMMENTlink modified 16 days ago • written 16 days ago by Manoj30

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLYlink written 16 days ago by genomax70k

with tsv-utils:

$ tsv-join --e --f file2.txt --k 1 file1.txt 
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094
ADD REPLYlink written 16 days ago by cpad011211k
1
gravatar for ATpoint
16 days ago by
ATpoint21k
Germany
ATpoint21k wrote:
grep -vf <(awk -F "\t" '{print $1}' file2) file1

Try to understand what the function does. The core part is grep -v -f, many tutorial on grep out there.

ADD COMMENTlink modified 16 days ago • written 16 days ago by ATpoint21k

This command taking too long time without any output. My files are tab delimited..

ADD REPLYlink written 16 days ago by Manoj30

My files are tab delimited..

Instead of -F " ", try -F "\t".

ADD REPLYlink written 16 days ago by Mensur Dlakic630

Changed delimiter in my comment.

ADD REPLYlink modified 16 days ago • written 16 days ago by ATpoint21k

Try to understand why, otherwise it is impossible to debug, especially when commands get more complex.

ADD REPLYlink written 16 days ago by ATpoint21k
1
gravatar for cpad0112
16 days ago by
cpad011211k
India
cpad011211k wrote:

you can use simpler one:

$ cut -f 1 file2.txt  | grep -vf - file1.txt 
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094

or using only awk :

$ awk -F "\t" 'NR==FNR{a[$1];next} !($1 in a)' file2.txt file1.txt
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094
ADD COMMENTlink modified 16 days ago • written 16 days ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1582 users visited in the last hour