Question: How do I replace the values in one file with the values in another file?
0
gravatar for giegie
19 days ago by
giegie0
giegie0 wrote:

I have 2 tab-separated files which look like this:

file1.txt

chr1    710000  715000  143
chr1    715000  720000  144
chr1    720000  725000  145
chr1    725000  730000  146
chr1    730000  735000  147
chr1    735000  740000  148
chr1    740000  745000  149
chr1    745000  750000  150
chr1    750000  755000  151
chr1    755000  760000  152
chr1    760000  765000  153

file2.txt

143 143 84
143 144 26
143 152 32
143 153 15
144 152 11

The expected output:

output.txt

chr1    710000  715000  chr1    710000  715000  84
chr1    710000  715000  chr1    715000  720000  26
chr1    710000  715000  chr1    755000  760000  32
chr1    710000  715000  chr1    760000  765000  15
chr1    715000  720000  chr1    755000  760000  11

I would like to match the unique numbers in file1.txt (column 4) with the numbers in file2.txt (column 1 and 2) and replace them with values from file1.txt (column 1-3). The output.txt should have 7 columns, where the last one have the corresponding values from the file2.txt (column 3).

hic • 143 views
ADD COMMENTlink modified 19 days ago by JC6.9k • written 19 days ago by giegie0

How is this related to bioinformatics, please?

ADD REPLYlink written 19 days ago by Kevin Blighe32k

The files are outputs of HiC-Pro pipeline that generates intra- and inter-chromosomal contact maps.

ADD REPLYlink modified 19 days ago • written 19 days ago by giegie0

Thank you

ADD REPLYlink written 19 days ago by Kevin Blighe32k
1
gravatar for Nitin Narwade
19 days ago by
Nitin Narwade320
NCCS, Pune
Nitin Narwade320 wrote:

Here is a simple python code, that you can use.

By the way where you are going to use this file. Is it an input for some specific tool or server?

###
##
##  USAGE: python script.py input1.tsv input2.tsv output.tsv
##
###

import sys

try:
    file1 = sys.argv[1]
    file2 = sys.argv[2]
    outputFileName = sys.argv[3]
except:
    print("ERROR: Missing commandline arguments.\n\n USAGE: python " + sys.argv[0] + " input1.tsv input2.tsv output.tsv")
    exit(0)

try:
    fr = open(file1, "r")
except:
    print("ERROR: Can not open " + file1)
    exit(0)

file1Dict = {}

for line in fr:
    line = line.strip()
    tempList = line.split("\t")
    file1Dict[tempList[3]] = tempList[0:2]

fr.close()

try:
    fr = open(file2, "r")
except:
    print("ERROR: Can not open " + file2)
    exit(0) 
try:
    fw = open( outputFileName, "w")
except:
    print("ERROR: Can not create " +  outputFileName)
    exit(0)

for line in fr:
    line = line.strip()
    tempList = line.split("\t")
    if(tempList[0] in file1Dict and tempList[1] in file1Dict):
        fw.write('\t'.join(file1Dict[tempList[0]]) + "\t" + '\t'.join(file1Dict[tempList[1]]) + "\t" + tempList[2] + "\n")

fr.close()
fw.close()

print("[INFO] Output written to " + outputFileName)
ADD COMMENTlink modified 19 days ago • written 19 days ago by Nitin Narwade320

Not sure why, but the code creates an empty output :( Thank you for your effort, I would be really glad if you could try to correct it. The files are the outputs of the HiC-Pro pipeline, I am using these files for downstream analysis of HiChIP data.

ADD REPLYlink modified 19 days ago • written 19 days ago by giegie0

Dear giegie, I have updated the above code, please try with this.

You have to run it using (Assuming you have saved this code with name script.py) command like python script.py input1.tsv input2.tsv output.tsv

output.tsv file will generate with contents given below.

chr1    710000  715000  chr1    710000  715000  84
chr1    710000  715000  chr1    715000  720000  26
chr1    710000  715000  chr1    755000  760000  32
chr1    710000  715000  chr1    760000  765000  15
chr1    715000  720000  chr1    755000  760000  11

I have tried with the example data it is working fine for me.

One more thing needs to clarify here, is the example data (One you have posted in the question) real HiC-Pro output or have you just created it by your own for sec of example.

could you please post some real data so that I can test the code at my side (In case if you have created the sample data by your own).

Thank you.

ADD REPLYlink modified 19 days ago • written 19 days ago by Nitin Narwade320
1
gravatar for JC
19 days ago by
JC6.9k
Mexico
JC6.9k wrote:

Some Perl:

#!/usr/bin/perl
use strict;
use warnings;

my $file1 = "file1.txt";
my $file2 = "file2.txt";
my $outfile = "output.txt";
my %tags = ();

open (my $f1, "<", $file1) or die "cannot read $file1\n";
while (<$f1>) {
    chomp;
    my ($chr, $ini, $end, $tag) = split (/\s+/, $_);
    $tags{$tag} = "$chr\t$ini\t$end";
}
close $f1;

open (my $f2, "<", $file2) or die "cannot read $file2\n";
open (my $out, ">", $outfile) or die "cannot write $outfile\n";
while (<$f2>) {
    chomp;
    my ($tag1, $tag2, $val) = split (/\s+/, $_);
    next unless (defined $tags{$tag1} and defined $tags{$tag2});
    print $out join "\t", $tags{$tag1}, $tags{$tag2}, $val;
    print $out "\n";
}
close $f2;
close $out;
ADD COMMENTlink modified 19 days ago • written 19 days ago by JC6.9k
1
gravatar for shenwei356
19 days ago by
shenwei3564.2k
China
shenwei3564.2k wrote:

The strategy is straightforward:

  1. reading file1 and save them in map/hash/dict with 4th column as keys,
  2. and then read line in file2 one by one, replacing 1th and 2nd column with previous readed values from file1.

Here's a simple solution using an unreleased version of csvtk, just for fun.

# re-arrange columns
$ csvtk cut -H -t -f 4,1-3 file1.txt > file1.re.txt

$ head -n 3 file1.re.txt
143     chr1    710000  715000
144     chr1    715000  720000
145     chr1    720000  725000

# replace value in column 1 and 2 with corresponding value provided by file1.re.txt
$ csvtk replace -H -t -k file1.re.txt -f 1,2 -p '(.+)' -r '{kv}' file2.txt -A
[INFO] read key-value file: file1.re.txt
[INFO] 11 pairs of key-value loaded
"chr1   710000  715000" "chr1   710000  715000" 84
"chr1   710000  715000" "chr1   715000  720000" 26
"chr1   710000  715000" "chr1   755000  760000" 32
"chr1   710000  715000" "chr1   760000  765000" 15
"chr1   715000  720000" "chr1   755000  760000" 11

# well, we need to remove the double quotes
$ csvtk replace -H -t -k file1.re.txt -f 1,2 -p '(.+)' -r '{kv}' file2.txt -A | sed 's/"//g'
[INFO] read key-value file: file1.re.txt
[INFO] 11 pairs of key-value loaded
chr1    710000  715000  chr1    710000  715000  84
chr1    710000  715000  chr1    715000  720000  26
chr1    710000  715000  chr1    755000  760000  32
chr1    710000  715000  chr1    760000  765000  15
chr1    715000  720000  chr1    755000  760000  11
ADD COMMENTlink modified 19 days ago • written 19 days ago by shenwei3564.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1241 users visited in the last hour