Match columns of two different files and paste a value of corrsponding match existed in another column
1
0
Entering edit mode
7.2 years ago
waqasnayab ▴ 250

Hi,

I have two files:

file1:
chr1    873251  0.0620
chr1    930248  0.9210
chr1    930939  0.2110
chr1    931131  0.6050

file2: 
chr1    873251  rs11240779      G       A       4047.9  6       0.75    8       171     23.81   1.071   53.55   ncRNA_intronic  0       FA
chr1    930248  rs41285790      G       A       2545.9  2       0.25    8       535     12.67   4.575   54.53   exonic  nonsynonymous_SNV
chr1    930939  rs9988021       G       A       4783.35 8       1       8       155     31.06   0       56.55   intronic        0       SA
chr1    931131  rs375757231     C       CCCCT   5151.25 6       0.75    8       168     34.13   77.892  54.05   intronic        0       SA

I want to match column 3 of file1 with column 2 of file 2 (search the entire of column 2 of file 2), and if column matches paste the value of column 4 from file 1 into file2 as new column 3. PS: column lengths are different. file 1 has 107011 values in each column, and file 2 has 107113 values in each column.

So, that my expected output would be:

chr1    873251  0.0620 rs11240779      G       A       4047.9  6       0.75    8       171     23.81   1.071   53.55   ncRNA_intronic  0       FA
    chr1    930248  0.9210 rs41285790      G       A       2545.9  2       0.25    8       535     12.67   4.575   54.53   exonic  nonsynonymous_SNV
    chr1    930939  0.2110 rs9988021       G       A       4783.35 8       1       8       155     31.06   0       56.55   intronic        0       SA
    chr1    931131  0.6050 rs375757231     C       CCCCT   5151.25 6       0.75    8       168     34.13   77.892  54.05   intronic        0       SA

I tired my luck a lot with awk but no success.

Best,

Waqas.

genome next-gen sequencing • 3.0k views
ADD COMMENT
0
Entering edit mode

What exactly did you try with awk ?

ADD REPLY
3
Entering edit mode
7.2 years ago

Here's a solution with BEDOPS sort-bed and bedmap, along with a couple Unix tools:

Convert the first file to a sorted BED file with awk and sort-bed:

$ awk '{ print $1"\t"$2"\t"($2+1)"\t"$3; }' file1.txt | sort-bed - > file1.bed

Do the same with the second file:

$ awk '{ print $1"\t"$2"\t"($2+1)"\t"substr($0, index($0,$3)); }' file2.txt | sort-bed - > file2.bed

Map the second file to the first with bedmap, and pipe the results of that to cut to filter out the columns you want:

$ bedmap --delim '\t' --exact --echo --echo-map --skip-unmapped file1.bed file2.bed | cut -f1,2,8- > answer.txt

The file answer.txt should follow the specification of your example's expected result.

ADD COMMENT
0
Entering edit mode

@Alex This really worked for me, although have some discrepancies in lines, but I managed at the end.

Thanks Alex.

Best,

Waqas.

ADD REPLY

Login before adding your answer.

Traffic: 1764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6