Question: comparing two files and adding columns of one file to another file if a fields in a specific column match in both files
0
gravatar for eDNAuRNA
2.1 years ago by
eDNAuRNA20
eDNAuRNA20 wrote:

Hi everyone,

I have two .tsv files. I want to match 3rd column of both files and if the value in each field matches, I want to print the entire row but also add columns 13th, 14th, 15th and 16th of file2.tsv in file1.tsv (append after 10th column of file1.tsv). Also if the value in 3rd column of both files doesn't match, the code should add "-" to the new file where fields from 13th, 14th, 15th and 16th of file2.tsv will be added.

I am trying awk command, but in case of a mismatch, "-" is added to only one column (11th) and rest of the three columns are from the original file1.tsv. Please help

awk -v OFS="\t" 'NR==FNR {a[$3]=$13"\t"$14"\t"$15"\t"$16;next}{$10=$10 "\t" (a[$3]?a[$3]: "-")}1' file1.tsv file2.tsv >
 output.tsv
linux awk compare files • 1.5k views
ADD COMMENTlink modified 2.1 years ago by michael.ante3.6k • written 2.1 years ago by eDNAuRNA20

Hello ehsanullah83,

  • Please show us an example of your input data and desired output.
  • How is this bioinformatic related?
  • Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
    code_formatting

Thanks!

fin swimmer

ADD REPLYlink written 2.1 years ago by finswimmer13k

Good description of data and required output . Could you post example data and expected output? @ ehsanullah83

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by cpad011214k
0
gravatar for michael.ante
2.1 years ago by
michael.ante3.6k
Austria/Vienna
michael.ante3.6k wrote:

Hi

You may have a look at this answer. The join command gives you quite nice output handling.

join -t $'\t' -1 3 -2 3 -a 1  -e "-" -o 1.1,1.2,0,1.4,1.5,1.6,1.7,1.8,1.9,1.10,2.13,2.14,2.15,2.16   file1.tsv file2.tsv

The -1 3 -2 3 define the fields on which too join, the -a 1 defines that file 1 is printed even if it's unpairable. The -e controls the output for unmatched fields; in this case if field 3 of file one doesn't have a match in file 2, every field defined in the output by -o is printed as '-'.

The -t $'\t' is a bit cryptic, but that's the way to encode tabs in join

You need to have both files sorted.

Cheers,

Michael

ADD COMMENTlink written 2.1 years ago by michael.ante3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1667 users visited in the last hour