Question: Compare pairs of key/value of perl hash tables in thre input files
0
gravatar for aiswaryabioinfo
7 weeks ago by
aiswaryabioinfo20 wrote:

Does anyone know how to compare pairs of key/value in two hashtables with a third file ? I'm currently working with three tab delimited files. The first two files contains the list of proteins with their pfam domain ids as information and the third file contains the domain-domain interactions. I need to compare all the files and identify the protein pairs if domains in one protein interacted with all of the corresponding domains of the other protein. Input files looks like :

Input file 1

XP_002372137.1    PF00754
XP_002372137.1    PF09118
XP_002372140.1    PF00202
XP_002372145.1    PF03747

Input file 2

XP_002372172.1    PF03446
XP_002372172.1    PF14833
XP_002372174.1    PF05378
XP_002372174.1    PF01968
XP_002372174.1    PF02538
XP_002372177.1    PF07690

Input file 3

XP_002372137.1    PF00754    PF03446    XP_002372172.1
XP_002372137.1    PF00754    PF14833    XP_002372172.1
XP_002372137.1    PF09118    PF03446    XP_002372172.1
XP_002372137.1    PF09118    PF14833    XP_002372172.1
XP_002372140.1    PF00202    PF05378    XP_002372174.1
XP_002372140.1    PF00202    PF01968    XP_002372174.1
XP_002372140.1    PF00202    PF02538    XP_002372174.1
XP_002372145.1    PF03747    PF07690    XP_002372177.1

The output should give the protein ids when domains in one protein interacted with all of the corresponding domains of the other protein

XP_002372137.1    XP_002372172.1
XP_002372137.1    XP_002372172.1
XP_002372137.1    XP_002372172.1
XP_002372137.1    XP_002372172.1
XP_002372140.1    XP_002372174.1
XP_002372140.1    XP_002372174.1
XP_002372140.1    XP_002372174.1
XP_002372145.1    XP_002372177.1
ADD COMMENTlink modified 7 weeks ago by JC11k • written 7 weeks ago by aiswaryabioinfo20

This is a pure programming question. Please search online or better, switch to Python (pandas)/R - this operation is much easier on those tools.

ADD REPLYlink written 7 weeks ago by RamRS28k
1
gravatar for JC
7 weeks ago by
JC11k
Mexico
JC11k wrote:

Not sure if this is what you need:

#!/usr/bin/perl

use strict;
use warnings;

$ARGV[2] or die "use interactions.pl FILE1 FILE2 FILE3 > OUT\n";

my $file1 = shift @ARGV;
my $file2 = shift @ARGV;
my $file3 = shift @ARGV;

my %set1  = ();
my %set2  = ();
my %inter = ();

open (my $f1, "<", "$file1") or die "cannot read $file1\n";
while (<$f1>) {
    chomp;
    my ($p, $d) = split (/\s+/, $_);
    $set1{$p}{$d}++;
}
close $f1;

open (my $f2, "<", "$file2") or die "cannot read $file2\n";
while (<$f2>) {
    chomp;
    my ($p, $d) = split (/\s+/, $_);
    $set2{$p}{$d}++;
}
close $f2;

open (my $f3, "<", "$file3") or die "cannot read $file3\n";
while (<$f3>) {
    chomp;
    my ($p1, $d1, $d2, $p2) = split (/\s+/, $_);
    $inter{"$p1=$p2"}{"$d1=$d2"}++;
}
close $f3;

foreach my $pair (keys %inter) {
    my ($p1, $p2) = split (/=/, $pair);
    my @d1 = keys %{ $set1{$p1} }; # total domains in p1
    my @d2 = keys %{ $set2{$p2} }; # total domains in p2
    my $expect = 0; # total expected interactions
    my $total = 0; # total interactions reported
    foreach my $d1 (@d1) {
        foreach my $d2 (@d2) {
            $expect++;
            $total++ if (defined $inter{$pair}{"$d1=$d2"});
        }
    }
    print "$p1\t$p2\n" if ($expect == $total); # print if all interactions was detected
}

testing it:

$ perl interactions.pl file1.txt file2.txt file3.txt
XP_002372137.1  XP_002372172.1
XP_002372140.1  XP_002372174.1
XP_002372145.1  XP_002372177.1
ADD COMMENTlink written 7 weeks ago by JC11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 746 users visited in the last hour