Question: Match columns of two different files and paste a value of corrsponding match existed in another column
0
gravatar for waqasnayab
3.1 years ago by
waqasnayab200
Pakistan
waqasnayab200 wrote:

Hi,

I have two files:

file1:
chr1    873251  0.0620
chr1    930248  0.9210
chr1    930939  0.2110
chr1    931131  0.6050

file2: 
chr1    873251  rs11240779      G       A       4047.9  6       0.75    8       171     23.81   1.071   53.55   ncRNA_intronic  0       FA
chr1    930248  rs41285790      G       A       2545.9  2       0.25    8       535     12.67   4.575   54.53   exonic  nonsynonymous_SNV
chr1    930939  rs9988021       G       A       4783.35 8       1       8       155     31.06   0       56.55   intronic        0       SA
chr1    931131  rs375757231     C       CCCCT   5151.25 6       0.75    8       168     34.13   77.892  54.05   intronic        0       SA

I want to match column 3 of file1 with column 2 of file 2 (search the entire of column 2 of file 2), and if column matches paste the value of column 4 from file 1 into file2 as new column 3. PS: column lengths are different. file 1 has 107011 values in each column, and file 2 has 107113 values in each column.

So, that my expected output would be:

chr1    873251  0.0620 rs11240779      G       A       4047.9  6       0.75    8       171     23.81   1.071   53.55   ncRNA_intronic  0       FA
    chr1    930248  0.9210 rs41285790      G       A       2545.9  2       0.25    8       535     12.67   4.575   54.53   exonic  nonsynonymous_SNV
    chr1    930939  0.2110 rs9988021       G       A       4783.35 8       1       8       155     31.06   0       56.55   intronic        0       SA
    chr1    931131  0.6050 rs375757231     C       CCCCT   5151.25 6       0.75    8       168     34.13   77.892  54.05   intronic        0       SA

I tired my luck a lot with awk but no success.

Best,

Waqas.

sequencing next-gen genome • 917 views
ADD COMMENTlink modified 3.1 years ago by Alex Reynolds29k • written 3.1 years ago by waqasnayab200

What exactly did you try with awk ?

ADD REPLYlink written 3.1 years ago by Nandini840
3
gravatar for Alex Reynolds
3.1 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Here's a solution with BEDOPS sort-bed and bedmap, along with a couple Unix tools:

Convert the first file to a sorted BED file with awk and sort-bed:

$ awk '{ print $1"\t"$2"\t"($2+1)"\t"$3; }' file1.txt | sort-bed - > file1.bed

Do the same with the second file:

$ awk '{ print $1"\t"$2"\t"($2+1)"\t"substr($0, index($0,$3)); }' file2.txt | sort-bed - > file2.bed

Map the second file to the first with bedmap, and pipe the results of that to cut to filter out the columns you want:

$ bedmap --delim '\t' --exact --echo --echo-map --skip-unmapped file1.bed file2.bed | cut -f1,2,8- > answer.txt

The file answer.txt should follow the specification of your example's expected result.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Alex Reynolds29k

@Alex This really worked for me, although have some discrepancies in lines, but I managed at the end.

Thanks Alex.

Best,

Waqas.

ADD REPLYlink written 3.1 years ago by waqasnayab200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1102 users visited in the last hour