**20**wrote:

Hello everyone,

I have **File 1** with 6 columns:

```
**geneId** "\t" **scaffold** "\t" **LocusID** "\t" **pfam** "\t" **PfamID** "\t" **function**
1 "\t" aa "\t" XL_012 "\t" pfam "\t" P1234 "\t" .............
2 "\t" bb "\t" XL_013 "\t" pfam "\t" P1324"\t ...........
3 "\t" cc "\t" XL_014 "\t" pfam "\t" P1624"\t ...........
3"\t" cc "\t" XL_014 "\t" pfam "\t" P1326"\t ...........
4"\t" dd "\t" XL_015 "\t" pfam "\t" P1326"\t ...........
4"\t" dd "\t" XL_015 "\t" pfam "\t" P1326"\t ...........
4 "\t" dd "\t" XL_015 "\t" pfam "\t" P1326"\t ...........
5 "\t" ee"\t" XL_016
```

**File 2** with 2 columns

```
**LocusID** "\t" **Expression**
XL_012 "\t" 23
XL_013 "\t" 20
XL_014 "\t" 15
XL_015 "\t" 10
XL_016 "\t" 9
```

**Desired output** : I want to merge column **expression** to file1 based on locus ID:

```
**geneId** "\t" **scaffold** "\t" **LocusID** "\t" **Expression** "\t" **pfam** "\t" **PfamID** "\t" **function**
1 "\t" aa "\t" XL_012 "\t" 23 "\t" pfam "\t" P1234 "\t" .............
2 "\t" bb "\t" XL_013 "\t" 20 "\t" pfam "\t" P1324"\t ...........
3 "\t" cc "\t" XL_014 "\t" 15 "\t" pfam "\t" P1624"\t ...........
3"\t" cc "\t" XL_014 "\t" 15 "\t"pfam "\t" P1326"\t ...........
4"\t" dd "\t" XL_015 "\t" 10 "\t"pfam "\t" P1326"\t ...........
4"\t" dd "\t" XL_015 "\t" 10 "\t"pfam "\t" P1326"\t ...........
4 "\t" dd "\t" XL_015 "\t" 10 "\t" pfam "\t" P1326"\t ...........
5 "\t" ee"\t" XL_016 "\t"9
```

I could do this easily if I had same number of rows for both files but here since I have different number of rows and some repetitions of locus ID due to many pfam domains for single gene I am super confused. Please help me figure out this one using awk, perl or bash. Many thanks in Advance!!!!

Thank you,

Ambika

Man! Where does this come from? I am eager to know ;)

2.1k