Question

How to compare two column from two files with specific condition in awk

0

Entering edit mode

4.7 years ago

statamn • 0

I have a data file A.tsv (field separator = \t) :

id  clade   mutation
243 40A titi,xixi,lolo
254 20B titi,toto,jiji,lala
261
267 20B lala,jiji,jojo

and a template file B.tsv (field separator = \t) :

40A titi,toto,lala
40F xaxa,jojo,huhu
40C sasa,sisi,lala

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. When the clade in A.tsv are 20B: - If the corresponding mutation in A.tsv have all the mutation of 40A in B.tsv, print in a new column (after the last of A.tsv) named Conclusion the clade 40A. - It's not a problem if the line 20B in A.tsv contain other mutation then those from 40A in B.tsv. - If the line 20B in A.tsv doesn't contain all the mutation from 40A in B.tsv, don't print anything. The result (store in a new file C.tsv) will look like this:

id  clade   mutation    Conclusion
243 40A titi,xixi,lolo  
254 20B titi,toto,jiji,lala 40A
261
267 20B lala,jiji,jojo

I start with that :

awk 'BEGIN{ OFS=FS="\t" }
  NR==FNR{ clade[$1]=$2; next }         
  FNR==1{ print $0, "Conclusion"; next }    
  !($2 in clade){ print; next }         
  {                                     
 XXXXXXXXX                        
  }
' B.tsv A.tsv > C.tsv

but I don't know how to do the rest (the XXXXXXXX part). Do you have an idea? Thanks

awk text-processing • 668 views

ADD COMMENT • link updated 4.7 years ago by cpad0112 21k • written 4.7 years ago by statamn • 0