Hello there, I have 2 files with the same header and same 1st column, I would like to compare rows by rows and calculate the percentage of matches of the genotype. Somehow I try to transpose it to compare column by column by using the function compare in r, I even try use awk to do the comparing, somehow I couldn't make it. Anyone please help me with some hints or tips. Thanks
Set1
barcode SNP000072 SNP000119 SNP000179 SNP001106 SNP001150
165974-1 A:A A:A A:A G:G C:C
165974-2 A:A A:A A:A G:A C:C
165974-3 A:A A:A G:A G:A C:C
165974-4 A:A A:A A:A A:A C:A
165974-5 A:A A:C A:A G:A ?
Set2
barcode SNP000072 SNP000119 SNP000179 SNP001106 SNP001150
165974-1 A:A A:A A:A G:G C:C
165974-2 A:A A:A A:A G:A C:C
165974-3 A:A A:A A:A G:A C:C
165974-4 A:A A:A A:A A:A C:A
165974-5 A:A A:A A:A G:A C:C
Expected output
barcode percentage(%)
165974-1 100
165974-2 100
165974-3 80
165974-4 100
165974-5 60
Thanks it works. It is a great help
I was wondering that if I can add one more condition that whenever there is "?", the command skip through it and does not include in the final count. For example, the row consists of "?" with NF of 5 instead I just counted 4. I know just to add a counter, but I don't know where I should make the changes. Thanks.
This is what I modify, but I know something is missing.