Question: (Closed) Perl :Help to match two IDs from two tables and print to a .txt file
0
gravatar for rajal2
4.4 years ago by
rajal20
Canada
rajal20 wrote:

I need some help to query a result to print out in a .txt file. The result is, by matching two tables in File1.txt and File2.txt.

For example, matching `Rev ID` from File1.txt with File2.txt, say `PROD_2_1`, the result should print to an output file all the Rev IDs after and including that Rev, such as `PROD_2_1`, `PROD_2_2`, its corresponding PROD ID and the Date Released.

This is File1.txt, File2.txt, and my code, which does not print all the Rev IDs after that version. Can you please help me where my code is failing? 

**File 1**
  
    Name        PROD ID       Rev ID   tag                version             ProjectName
    fusetop     9903420    PROD_2_5    SERIALPROD2V9    BXTE0PRODRTL2V2  PROJP
    doshrl2top    9903340    PROD_2_3    SERIALPROD1V6    BXTE0PRODRTL1V   PROJP
    c73p1avrpg    99036247   PROD_2_1    SERIALPROD1V1    BXTE0PRODRTL1V1     PROJP
    c73p1        99034236   PROD_2_2    SERIALPROD1V1    BXTE0PRODRTL1V1     PROJP
    150top        99034238   PROD_2_2    SERIALPROD1V1    BXTE0PRODRTL1V1     PROJP
    familyewp    99033482   PROD_2_3    SERIALPROD1V21    BXTE0PRODRTL1V121PROJP

**File 2**

    Type        Name        Rev ID        PROD ID      PROD GROUP    Date Released    PROD Category    Project IDs
    IComponent    c73p1avrpg    PROD_2_2    99036247    SEG         3/3/2015 3:34    Hard    
    IComponent    c73p1avrpg    PROD_2_1    99036247    SEG       11/15/2014 18:41    Hard    
    IComponent    c73p1avrpg    PROD2_0        99036247    SEG        9/22/2014 1:36    Hard    
    IComponent    c73p1avrpg    PROD_1_1    99036247    SEG        6/12/2014 23:51    Hard    
    IComponent    c73p1avrpg    PROD_1_0    99036247    SEG         4/8/2014 11:05    Hard    

**My code:**

    #!/bin/env perl

    use strict;
    use warnings;

    my $file1 = "FILE1.txt";
    my $file2 = "FILE2.txt";
    my $OUTPUT = "OUTPUT.txt";
    my %results = (); 
    open FILE1, "$file1" or die "Could not open $file1 \n";
    while(my $matchLine = <FILE1>)
           {   
             $results{$matchLine} = 1;
           }
    close(FILE1); 
    open FILE2, "$file2" or die "Could not open $file2 \n";
    while(my $matchLine =<FILE2>) 
           {  
    $results{$matchLine} = 2 if $results{$matchLine}; #Only when already found  in file1
           }
    close(FILE2);  
    open (OUTPUT, ">$OUTPUT") or die "Cannot open $OUTPUT \n";
    foreach my $matchLine (keys %results) { 
    print OUTPUT $matchLine if $results{$matchLine} ne 1;
           }
    close OUTPUT;
perl • 2.1k views
ADD COMMENTlink modified 4.4 years ago by Pierre Lindenbaum124k • written 4.4 years ago by rajal20
2

This question is irrelevant to this site, maybe you can get better results from http://stackoverflow.com/?

ADD REPLYlink written 4.4 years ago by Sam2.5k
I'm not sure I fully understand what you want to achieve. Do you want to print all lines with ID occuring in both files? Currently, you are storing and comparing the entire line in the hash. Therefore you will only find duplicates lines and not IDs.
ADD REPLYlink written 4.4 years ago by thackl2.7k

Oh, I do not want to duplicate the line but I want to only print the matching IDs as per my question.

ADD REPLYlink written 4.4 years ago by rajal20

Hello rajal2!

We believe that this post does not fit the main topic of this site.

off topic

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 4.4 years ago by Michael Dondrup46k

I don't disagree with Michael, but I also already have this solution ready...

NOTE: Your file need to be tab delimited, not just "spaces" for it to work

#!/usr/bin/env perl
use warnings;
use strict;

my %lines;
my %ids;

open (my $fh1, "file1.tsv") or die $!;
open (my $fh2, "file2.tsv") or die $!;

while (<$fh1>) {
    next if /^Name|^$/; # ignore header and empty lines
    my ($type, $name, $rev_id) = split("\t", $_); # get id column
    my $id = $rev_id;
    $id =~ s/_\d+$//; # just look at first part of rev_id
    $ids{$id}++
}
close $fh1;

while (<$fh2>) {
    next if /^Type|^$/; # ignore header and empty lines
    my ($type, $name, $rev_id, $prod_id, $group, $date) = split("\t", $_); # get id column
    my $id = $rev_id;
    $id =~ s/_\d+$//; # just look at first part of rev_id

    next unless $ids{$id}; # only ids from file one

    print join("\t", $rev_id, $prod_id, $date), "\n";
}
close $fh2;

 

ADD REPLYlink written 4.4 years ago by thackl2.7k

if it's space delimited you can use \s+ for your regex , it will match tabs and spaces both.

 
ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by dylan.storey60

Won't work because his dates contain spaces ;)

ADD REPLYlink written 4.4 years ago by thackl2.7k

his test cases don't show that. =D 

ADD REPLYlink written 4.4 years ago by dylan.storey60

actually, they do:

3/3/2015 3:34

and also the IDs

Rev ID        PROD ID      PROD GROUP ...

:D

ADD REPLYlink written 4.4 years ago by thackl2.7k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1290 users visited in the last hour