Compare two columns from two files and print the content of both the files
1
0
Entering edit mode
22 months ago
Utkarsha • 0

I have two files. I want to compare the 16th column of file1 with 1st column of file2, while maintaining the order of file1 and print the content of both the files.

file1.txt (some entries do not have any value in column 16)

Paja_peg_0001__lcl|NZ_CP016908.1_prot_WP_075275866.1_1_[gene=dnaN]_[locus_tag=BCY86_RS00005]_[protein=DNA_polymerase_III_subunit_beta]_[protein_id=WP_075275866.1]_[location=209..1363]_[gbkey=CDS]     WP_245776233.1  WP_245776233.1 DNA polymerase III subunit beta [Pajaroellobacter abortibovis]   14      384     1       371     371     384     96.6    100     100     100     5.72e-259       714    1882918    Paja_peg_0001__lcl|NZ_CP016908.1_prot_WP_075275866.1_1_[gene=dnaN]_[locus_tag=BCY86_RS00005]_[protein=DNA_polymerase_III_subunit_beta]_[protein_id=WP_075275866.1]_[location=209..1363]_[gbkey=CDS]     MBV9946623.1    MBV9946623.1 DNA polymerase III subunit beta [Myxococcales bacterium]   1       384     1       381     381     384     100     100     59.3    78.5    2.59e-150       440
Paja_peg_0001__lcl|NZ_CP016908.1_prot_WP_075275866.1_1_[gene=dnaN]_[locus_tag=BCY86_RS00005]_[protein=DNA_polymerase_III_subunit_beta]_[protein_id=WP_075275866.1]_[location=209..1363]_[gbkey=CDS]     WP_146648019.1  WP_146648019.1 DNA polymerase III subunit beta [Labilithrix luteola]<>AKU96781.1 DNA polymerase III beta subunit [Labilithrix luteola]  1       384     1       385     385     384     100     100     57.4    75.4    1.31e-146       431     1391654
Paja_peg_0001__lcl|NZ_CP016908.1_prot_WP_075275866.1_1_[gene=dnaN]_[locus_tag=BCY86_RS00005]_[protein=DNA_polymerase_III_subunit_beta]_[protein_id=WP_075275866.1]_[location=209..1363]_[gbkey=CDS]     WP_136921399.1  WP_136921399.1 DNA polymerase III subunit beta [Polyangium aurulentum]<>TKC78369.1 DNA polymerase III subunit beta [Polyangium aurulentum]      1       384     1       377     377     384     100     100     54.0    71.5    3.13e-128       384     2567896

file2.txt

1391654 Bacteria        Proteobacteria  Deltaproteobacteria     Myxococcales    Labilitrichaceae        Labilithrix     Labilithrix luteola
1882918 Bacteria        Proteobacteria  Deltaproteobacteria     Myxococcales    Polyangiaceae   Pajaroellobacter        Pajaroellobacter abortibovis
2567896 Bacteria        Proteobacteria  Deltaproteobacteria     Myxococcales    Polyangiaceae   Polyangium      Polyangium aurulentum

Desired output:

Paja_peg_0001__lcl|NZ_CP016908.1_prot_WP_075275866.1_1_[gene=dnaN]_[locus_tag=BCY86_RS00005]_[protein=DNA_polymerase_III_subunit_beta]_[protein_id=WP_075275866.1]_[location=209..1363]_[gbkey=CDS]     WP_245776233.1  WP_245776233.1 DNA polymerase III subunit beta [Pajaroellobacter abortibovis]   14      384     1       371     371     384     96.6    100     100     100     5.72e-259       714    1882918   Bacteria        Proteobacteria  Deltaproteobacteria     Myxococcales    Polyangiaceae   Pajaroellobacter        Pajaroellobacter abortibovis
Paja_peg_0001__lcl|NZ_CP016908.1_prot_WP_075275866.1_1_[gene=dnaN]_[locus_tag=BCY86_RS00005]_[protein=DNA_polymerase_III_subunit_beta]_[protein_id=WP_075275866.1]_[location=209..1363]_[gbkey=CDS]     MBV9946623.1    MBV9946623.1 DNA polymerase III subunit beta [Myxococcales bacterium]   1       384     1       381     381     384     100     100     59.3    78.5    2.59e-150       440
Paja_peg_0001__lcl|NZ_CP016908.1_prot_WP_075275866.1_1_[gene=dnaN]_[locus_tag=BCY86_RS00005]_[protein=DNA_polymerase_III_subunit_beta]_[protein_id=WP_075275866.1]_[location=209..1363]_[gbkey=CDS]     WP_146648019.1  WP_146648019.1 DNA polymerase III subunit beta [Labilithrix luteola]<>AKU96781.1 DNA polymerase III beta subunit [Labilithrix luteola]  1       384     1       385     385     384     100     100     57.4    75.4    1.31e-146       431     1391654  Bacteria        Proteobacteria  Deltaproteobacteria     Myxococcales    Labilitrichaceae        Labilithrix     Labilithrix luteola
Paja_peg_0001__lcl|NZ_CP016908.1_prot_WP_075275866.1_1_[gene=dnaN]_[locus_tag=BCY86_RS00005]_[protein=DNA_polymerase_III_subunit_beta]_[protein_id=WP_075275866.1]_[location=209..1363]_[gbkey=CDS]     WP_136921399.1  WP_136921399.1 DNA polymerase III subunit beta [Polyangium aurulentum]<>TKC78369.1 DNA polymerase III subunit beta [Polyangium aurulentum]      1       384     1       377     377     384     100     100     54.0    71.5    3.13e-128       384     2567896  Bacteria        Proteobacteria  Deltaproteobacteria     Myxococcales    Polyangiaceae   Polyangium      Polyangium aurulentum
bash • 411 views
ADD COMMENT
0
Entering edit mode
22 months ago

add the line number to file 1 with awk '{printf("%d\t%s\n",NR,$0);}'

sort both files on the desired column

join https://linux.die.net/man/1/join

sort the output on the line number

ADD COMMENT

Login before adding your answer.

Traffic: 2839 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6