Joining two files with multiple columns
1
1
Entering edit mode
8.5 years ago

I have two files with large data with matching id at different column. I want to merge those identical columns with other values along the line. I want to merge file 1 (column 10) and file 2 (column 8)

File 1

chr0    385308    T    A    228    hom    17    17    .    BCL026842    384745    386336    +
chr0    589920    C    T    73    het    16    5    .    BCL026857    589920    590284    -
chr0    589925    T    C    203    hom    15    15    .    BCL026858    589920    590284    -
chr0    590091    C    T    140    hom    6    6    .    BCL026759    589920    590284    -
chr0    590131    A    C    74    hom    4    4    .    BCL026660    589920    590284    -
chr0    590142    A    C    159    hom    7    7    .    BCL025261    589920    590284    -
chr0    590161    G    A    228    hom    10    10    .    BCL024262    589920    590284    -
chr0    590193    A    G    228    hom    15    15    .    BCL023163    589920    590284    -
chr0    590281    G    A    228    hom    20    20    .    BCL026864    589920    590284    -

File 2

g111    scaffold00001    52496    52496    G    C    exonic    BCL026842    nonsynonymous SNV    "BCL001919:BCL001919T1:exon3:c.427C>G:p.P143A,"
g112    scaffold00001    52501    52501    G    T    exonic    BCL026857    nonsynonymous SNV    "BCL001919:BCL001919T1:exon3:c.422C>A:p.T141N,"
g122    scaffold00001    60197    60197    G    A    exonic    BCL026858    synonymous SNV    "BCL001920:BCL001920T1:exon2:c.276C>T:p.D92D,"
g156    scaffold00001    80052    80052    C    T    exonic    BCL026859    synonymous SNV    "BCL001921:BCL001921T2:exon1:c.240G>A:p.P80P,BCL001921:BCL001921T3:exon1:c.240G>A:p.P80P,"
g328    scaffold00001    166481    166481    C    T    exonic    BCL026860    synonymous SNV    "BCL001929:BCL001929T1:exon3:c.1110G>A:p.T370T,"
g329    scaffold00001    168237    168237    T    A    exonic    BCL026861    nonsynonymous SNV    "BCL001929:BCL001929T1:exon1:c.92A>T:p.N31I,"
g360    scaffold00001    178660    178660    T    C    exonic    BCL026862    synonymous SNV    "BCL001930:BCL001930T1:exon2:c.177A>G:p.G59G,"
g370    scaffold00001    180974    180974    A    G    exonic    BCL026863    synonymous SNV    "BCL001931:BCL001931T1:exon6:c.1521T>C:p.F507F,BCL001931:BCL001931T2:exon6:c.1521T>C:p.F507F,"
g414    scaffold00001    189463    189463    A    G    exonic    BCL026864    nonsynonymous SNV    "BCL001933:BCL001933T1:exon1:c.56T>C:p.V19A,"

Desired output

chr0    385308    T    A    228    hom    17    17    .    BCL026842    384745    386336    +    g111    scaffold00001    52496    52496    G    C    exonic    BCL026842    nonsynonymous SNV    "BCL001919:BCL001919T1:exon3:c.427C>G:p.P143A,"
chr0    589920    C    T    73    het    16    5    .    BCL026857    589920    590284    -   g112    scaffold00001    52501    52501    G    T    exonic    BCL026857    nonsynonymous SNV    "BCL001919:BCL001919T1:exon3:c.422C>A:p.T141N,"
chr0    589925    T    C    203    hom    15    15    .    BCL026858    589920    590284    -   g122    scaffold00001    60197    60197    G    A    exonic    BCL026858    synonymous SNV    "BCL001920:BCL001920T1:exon2:c.276C>T:p.D92D,"
chr0    590281    G    A    228    hom    20    20    .    BCL026864    589920    590284    -   g414    scaffold00001    189463    189463    A    G    exonic    BCL026864    nonsynonymous SNV    "BCL001933:BCL001933T1:exon1:c.56T>C:p.V19A,"

Any response in any command is appreciated. Thank you in advance

join grep sort awk perl • 2.0k views
ADD COMMENT
2
Entering edit mode
8.5 years ago
michael.ante ★ 3.8k

Using join it's quite easy. Of course, you need to sort the files according to the field to join them:

join -1 10 -2 8 <(sort -k10 FILE1) <(sort -k8 FILE2) > joined_file.txt

If you want to select certain fields you can use either the -o option in the join command (-o 1.1,1.2...0,2.1,2.2...) or the cut tool on the joined file (cut -f 1,2,3.....)

ADD COMMENT
0
Entering edit mode

Thanks a lot!!! It worked!!!! :)

ADD REPLY

Login before adding your answer.

Traffic: 2498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6