Question

use information from a file to substite information in another file

0

Entering edit mode

17 months ago

baijiangshan9726 • 0

Hi, I have a file1:

60000   498177
65000   498178
70000   498179
75000   498180
80000   498181
85000   498182
90000   498183
95000   498184
100000  498185
105000  498186
110000  498187
115000  498188

and a file2:

60000   70000   1
70000   70000   20
70000   85000   1
80000   85000   1
110000  110000  23
115000  115000  3
115000  120000  1
120000  120000  2
80000   125000  1

I want to use the information in file1 to substitute some information in file2. If the 1st and 2nd column of file2 is the same as the 1st column in file1, then use 2nd column to substitute 1st,2nd column in file2. The final result should look like this:

498177  498179  1
498179  498179  20
498179  498182  1
498181  498182  1
498187  498187  23
498188  498188  3
498188  498189  1
498189  498189  2
498181  498190  1
498190  498190  2

I wrote a python script, but the speed to process is very slow(or to say the file2 is very big, it has 8 million rows). How should I do this much quicker? Thanks!

linux python awk sed • 719 views

ADD COMMENT • link 17 months ago by baijiangshan9726 • 0

1

Entering edit mode

how is it related to bioinformatics ? You want join by the way.

ADD REPLY • link 17 months ago by Pierre Lindenbaum 161k

0

Entering edit mode

hi, It's actually a hic file. I am analysis hic data.

ADD REPLY • link 17 months ago by baijiangshan9726 • 0

0

Entering edit mode

I still don't see how that is related to bioinformatics.

If your python script is slow, you can divide the original file into 10 pieces (or 20, if you have that many CPU threads) and run the substitution on each one of them in parallel. When it is done, you concatenate them back. You would use split and cat commands.

ADD REPLY • link 17 months ago by Mensur Dlakic ★ 27k

0

Entering edit mode

hi, thanks so much. join is very good to use.

ADD REPLY • link 17 months ago by baijiangshan9726 • 0