Question

awk command for printing all the repeated matching lines without making them unique

0

Entering edit mode

2.4 years ago

Confused_human ▴ 20

I have two files file1 file2 which has taxonomy details .

for example

file1 : ( it has taxonomy ID - some digit)

9

4

file2 : ( it has other taxonomy details along with taxonomy ID )

9 A B C D

4 P Q R S

I want to get an output like output :

9 A B C D

4 P Q R S

I tried using this command

awk -F '\t' 'NR==FNR{a[$1];next} ($1) in a' file1 file2

awk • 1.0k views

ADD COMMENT • link updated 2.4 years ago by Dunois ★ 2.5k • written 2.4 years ago by Confused_human ▴ 20

1

Entering edit mode

$ cat test1.txt | xargs -i sed -n '/{}/p' test2.txt

9   A B C D
9   A B C D
4   P Q R S
4   P Q R S
4   P Q R S

$ parallel sed -n /{}/p test2.txt :::: test1.txt 
9   A B C D
9   A B C D
4   P Q R S
4   P Q R S
4   P Q R S

ADD REPLY • link 2.4 years ago by cpad0112 21k

0

Entering edit mode

Why cat into sed?

ADD REPLY • link 2.4 years ago by Dunois ★ 2.5k

0

Entering edit mode

how is it related to bioinformatics ?

ADD REPLY • link 2.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I have two taxonomy data files , I am trying to map them with their taxID . and want to get all the repeated matched taxIDs along with other details.

ADD REPLY • link 2.4 years ago by Confused_human ▴ 20

score 1 · Answer 1 · 2021-12-14

1

Entering edit mode

2.4 years ago

Dunois ★ 2.5k

You don't need awk for this.

Following the data you shared here, just sort file1 and file2, and use join like so:

$ join -1 1 -2 1 <(sort file1) <(sort file2)
4 P Q R S
4 P Q R S
4 P Q R S
9 A B C D
9 A B C D

ADD COMMENT • link 2.4 years ago by Dunois ★ 2.5k

score 0 · Answer 2 · 2021-12-14

0

Entering edit mode

2.4 years ago

Pierre Lindenbaum 161k

join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 file1) <(sort -t $'\t' -k1,1 file2)

ADD COMMENT • link 2.4 years ago by Pierre Lindenbaum 161k