Question: how to find mismatch pattern from one file to another
0
gravatar for Manoj
11 months ago by
Manoj40
Canada
Manoj40 wrote:

I have two tab delimited files, one contains list of nodes (file 1) and another file contains list of nodes and its additional details. I need to find nodes (file 1), which are not matching in another file (file 2). Please see example below:

File 1:
NODE_35_length_25224_cov_45.741
NODE_42_length_28456_cov_53.6579
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094

File 2:
NODE_35_length_25224_cov_45.741 Prodigal:2.6    CDS 58  777 .   -   0   ID=LGE0470
NODE_42_length_28456_cov_53.6579    Prodigal:2.6    CDS 58  777 .   -   0   ID=LGE0445

OUTPUT:
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094
sequence alignment • 222 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by Manoj40

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLYlink written 11 months ago by genomax85k

with tsv-utils:

$ tsv-join --e --f file2.txt --k 1 file1.txt 
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094
ADD REPLYlink written 10 months ago by cpad011213k
1
gravatar for ATpoint
11 months ago by
ATpoint36k
Germany
ATpoint36k wrote:
grep -vf <(awk -F "\t" '{print $1}' file2) file1

Try to understand what the function does. The core part is grep -v -f, many tutorial on grep out there.

ADD COMMENTlink modified 11 months ago • written 11 months ago by ATpoint36k

This command taking too long time without any output. My files are tab delimited..

ADD REPLYlink written 11 months ago by Manoj40

My files are tab delimited..

Instead of -F " ", try -F "\t".

ADD REPLYlink written 11 months ago by Mensur Dlakic5.8k

Changed delimiter in my comment.

ADD REPLYlink modified 11 months ago • written 11 months ago by ATpoint36k

Try to understand why, otherwise it is impossible to debug, especially when commands get more complex.

ADD REPLYlink written 11 months ago by ATpoint36k
1
gravatar for cpad0112
10 months ago by
cpad011213k
India
cpad011213k wrote:

you can use simpler one:

$ cut -f 1 file2.txt  | grep -vf - file1.txt 
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094

or using only awk :

$ awk -F "\t" 'NR==FNR{a[$1];next} !($1 in a)' file2.txt file1.txt
NODE_43_length_25224_cov_33.7544
NODE_226_length_737_cov_1.98094
ADD COMMENTlink modified 10 months ago • written 10 months ago by cpad011213k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 741 users visited in the last hour