Join Information of file 2 in file 1
1
0
Entering edit mode
4.0 years ago

I have a file 1 with only IDs information:

BAC0713 
BAC0713 
BAC0713 
BAC0755 
BAC0755
BAC0353
BAC0353
.....

And I have a file 2 with more information:

BAC0713 copM A0A0F6QDN6  Synechocystis sp. PCC 6803 Plasmid Copper (Cu)

BAC0755 galE A0A1Z2RUL8 Uncultured bacterium Chromosome Cetyltrimethylammonium bromide (CTAB) [class: Quaternary Ammonium Compounds (QACs)]

BAC0353 smdA A7VN01 Serratia marcescens Chromosome 4,6-diamidino-2-phenylindole (DAPI) [class: Diamidine], Hoechst 33342 [class: Bisbenzimide] 
....

I want to join the information of file 2 in file 1. How can I do that?

The example output is:

BAC0713 copM A0A0F6QDN6  Synechocystis sp. PCC 6803 Plasmid Copper (Cu)
BAC0713 copM A0A0F6QDN6  Synechocystis sp. PCC 6803 Plasmid Copper (Cu)
BAC0713 copM A0A0F6QDN6  Synechocystis sp. PCC 6803 Plasmid Copper (Cu)
BAC0755 galE A0A1Z2RUL8 Uncultured bacterium Chromosome Cetyltrimethylammonium bromide (CTAB) [class: Quaternary Ammonium Compounds (QACs)]
BAC0755 galE A0A1Z2RUL8 Uncultured bacterium Chromosome Cetyltrimethylammonium bromide (CTAB) [class: Quaternary Ammonium Compounds (QACs)]
BAC0755 galE A0A1Z2RUL8 Uncultured bacterium Chromosome Cetyltrimethylammonium bromide (CTAB) [class: Quaternary Ammonium Compounds (QACs)]
BAC0353 smdA A7VN01 Serratia marcescens Chromosome 4,6-diamidino-2-phenylindole (DAPI) [class: Diamidine], Hoechst 33342 [class: Bisbenzimide]
BAC0353 smdA A7VN01 Serratia marcescens Chromosome 4,6-diamidino-2-phenylindole (DAPI) [class: Diamidine], Hoechst 33342 [class: Bisbenzimide]
tables diamond database • 843 views
ADD COMMENT
2
Entering edit mode
4.0 years ago
bins14 ▴ 40

I guess this should do the trick in bash.

join -1 1 -2 1 <(sort -k1 file2)  <(uniq file1|sort -k 1)
ADD COMMENT
1
Entering edit mode

Assuming that file 1 has repeats and file2 not

ADD REPLY
0
Entering edit mode

Yea, the file1 has repeats IDs, but the file2 doesn't have. The command join -1 1 -2 1 <(sort -k1 file2) <(uniq file1|sort -k 1) don't do anything.

ADD REPLY
1
Entering edit mode

I tried myself, It worked fine for me the example output I showed is what I get. How did you use it and what is the delimiter?

ADD REPLY
0
Entering edit mode

file2 had some duplicate IDs, I took them out and it worked. Thank you

ADD REPLY
0
Entering edit mode
# Example output
BAC0353 smdA A7VN01 Serratia marcescens Chromosome 4,6-diamidino-2-phenylindole (DAPI) [class: Diamidine], Hoechst 33342 [class: Bisbenzimide] 

BAC0713 copM A0A0F6QDN6 Synechocystis sp. PCC 6803 Plasmid Copper (Cu)

BAC0755 galE A0A1Z2RUL8 Uncultured bacterium Chromosome Cetyltrimethylammonium bromide (CTAB) [class: Quaternary Ammonium Compounds (QACs)]
ADD REPLY

Login before adding your answer.

Traffic: 1514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6