I have two tab separated files, File_1 contains exonic (output by stringtie) information and its structure is like this:
e_id    chr strand  start   end rcount  ucount  mrcount cov cov_sd  mcov    mcov_sd
1   1   +   3631    3913    46  46  46  10.371  10.5056 10.371  10.5056
2   1   +   3996    4276    83  83  83  22.3559 4.7919  22.3559 4.7919
3   1   +   4486    4605    47  47  47  25.2333 7.4294  25.2333 7.4294
4   1   +   4706    5095    120 120 120 23.5718 3.9786  23.5718 3.9786
File_2 contains splice junctions (outout by STAR) information and its structure is like this:
chr start   end strand 
1   3914    3995    1
1   4277    4485    1   
1   4496    4505    1
1   4716    5075    1
* strand (0: undefined, 1: +, 2: -)
I am interested in script which first check chromosome number and then extract those lines in which start and end coordinates ($2 and $3) of file_2 lies within start and end coordinate ($4 and $5) of file_1, so the expected output will be overlapped rows from file_1 + rows from file_2. For example, start and end coordinate in third and fourth row of file_2 lies within third and fourth row of file_1 so the expected output will be:
e_id    chr strand  start   end rcount  ucount  mrcount cov cov_sd  mcov    mcov_sd chr start   end strand
3   1   +   4486    4605    47  47  47  25.2333 7.4294  25.2333 7.4294    1 4496    4505    1
4   1   +   4706    5095    120 120 120 23.5718 3.9786  23.5718 3.9786    1 4716    5075    1
Thanks in advance
Many thanks for your efforts, is it possible to place the matching rows from file_2 exactly in front of matching rows of file_1, currently its pasting all rows from file_1 and matched rows from file_2
Try swapping the files when you
paste.