How to compare two column from two files with specific condition in awk
0
0
Entering edit mode
4.6 years ago
statamn • 0

I have a data file A.tsv (field separator = \t) :

id  clade   mutation
243 40A titi,xixi,lolo
254 20B titi,toto,jiji,lala
261
267 20B lala,jiji,jojo

and a template file B.tsv (field separator = \t) :

40A titi,toto,lala
40F xaxa,jojo,huhu
40C sasa,sisi,lala

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. When the clade in A.tsv are 20B: - If the corresponding mutation in A.tsv have all the mutation of 40A in B.tsv, print in a new column (after the last of A.tsv) named Conclusion the clade 40A. - It's not a problem if the line 20B in A.tsv contain other mutation then those from 40A in B.tsv. - If the line 20B in A.tsv doesn't contain all the mutation from 40A in B.tsv, don't print anything. The result (store in a new file C.tsv) will look like this:

id  clade   mutation    Conclusion
243 40A titi,xixi,lolo  
254 20B titi,toto,jiji,lala 40A
261
267 20B lala,jiji,jojo

I start with that :

awk 'BEGIN{ OFS=FS="\t" }
  NR==FNR{ clade[$1]=$2; next }         
  FNR==1{ print $0, "Conclusion"; next }    
  !($2 in clade){ print; next }         
  {                                     
 XXXXXXXXX                        
  }
' B.tsv A.tsv > C.tsv

but I don't know how to do the rest (the XXXXXXXX part). Do you have an idea? Thanks

awk text-processing • 619 views
ADD COMMENT

Login before adding your answer.

Traffic: 3180 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6