Entering edit mode
9.2 years ago
Deepak Tanwar
★
4.2k
I am reading a file and storing the values in a hash.
Now, I am reading another file and storing the values in another hash. This file may contain the repeated values. So, I am checking the condition, if the key is already there and its value of 10th column is greater than the line which I am reading right now, than skip or replace it. I am getting output, sam as my input, without removing the repeated lines. What is wrong in my code?
use strict;
use warnings;
my $file1 = shift;
my $file2 = shift;
open (my $gene, $file1) || die "Can't open file $file1";
my %length;
while (my $ge_line = <$gene>){
chomp $ge_line;
next if $ge_line =~ /^\#/;
my @ge_split = split(/\t/, $ge_line);
$length{$ge_split[5]} = $ge_split[8];
}
close($gene);
my %seen;
open (my $tcga, $file2) || die "Can't open file $file2";
while (my $tc_line = <$tcga>){
chomp $tc_line;
next if $tc_line =~ /^\#/;
my @tc_split = split(/\t/, $tc_line);
next unless $tc_split[9] eq "syn";
my $key = $tc_split[0]."-".$tc_split[1]."-".$tc_split[2]."-".$tc_split[3]."-".$tc_split[4]."-".$tc_split[5];
my $my_len = $length{$tc_split[5]};
unless ($my_len) {
print STDERR "Could not find the length for $tc_split[5]\n" ;
next;
}
if (exists $seen{$key}){
my $a = $seen{$key};
my @spl_a = split(/\t/, $a);
if ($spl_a[10] > $my_len) {
next;
}else {
$seen{$key} = $tc_line . "\t" . $my_len;
}
}else{
$seen{$key} = $tc_line . "\t" .$my_len;
}
}
foreach my $print(sort keys %seen){
print $seen{$print} , "\n";
}
close ($tcga);