Question: Illegal Division by zero
0
gravatar for sicat.paolo20
10 months ago by
sicat.paolo2030 wrote:

I am trying to work on TEclass from https://academic.oup.com/bioinformatics/article/25/10/1329/269651

When I'm using the TEclassBuild.pl I get an Illegal division by zero error at line 179 message.

Looking at the subroutine that handles that;

#-------------------------------------------------------------------------------
# Purpose   : Builds a feature-vector for each sequence. The features are  
#             oligomer frequencies.
# Usage     : process_sequence($string, $number, *filehandle) 
# Arguments : sequence (string), prefix (number), output (filehandle) 
# Returns   : Nothing, prints.
# Globals   : Uses the global variables of the build_tetramer_hash (or pentamer) 
#             subroutines
#*******************
sub process_sequence {
    my $sequence = uc($_[0]);
    my $prefix = $_[1]; # The the SVM "classifier"
    my $OUT = $_[2];    # Filehandle

    # The vector of SVM features. Initially every element is zero
    my @features; 
    foreach my $x (0..$oligos) {
        $features[$x] = 0;
    }
    my $iterations = length($sequence)-$mer-1;
    # Counts the number of occurences of each x-mer in the sequence
    foreach my $x (0..$iterations) {
        if ( exists( $x_mers{substr($sequence, $x, $mer)} ) ) {
            $features[$x_mers{substr($sequence, $x, $mer)}]++;
            $features[$oligos+1]++;
        }
    }
    # Calculates and prints the frequencies for each oligomer
    print $OUT $prefix, "\t";
    foreach my $x (0..$oligos) {
        $features[$x] /= $features[$oligos+1]; 
        print $OUT "", ($x+1), ':', $features[$x], "\t";
    }
    print $OUT "\n";    
}

Line 179 is

    176: # Calculates and prints the frequencies for each oligomer
    177: print $OUT $prefix, "\t";
    178: foreach my $x (0..$oligos) {
    179:    $features[$x] /= $features[$oligos+1]; 
    180:    print $OUT "", ($x+1), ':', $features[$x], "\t";
    181:}

Line 179 shows that they add a +1 to the denominator. Wouldn't that mean that I am not dividing the $features[$x] with 0? Any suggestions? I ran it a couple of times with different parameters but still I receive the same error message.

ADD COMMENTlink modified 10 months ago by Michael Dondrup46k • written 10 months ago by sicat.paolo2030
1

They add +1 to the index of features.

It seems the if statement some 8 lines back is never true.

ADD REPLYlink modified 10 months ago • written 10 months ago by jomo018480

Thanks for the suggestion. I tried the suggestion below first but if that doesn't work, I'll take a look further into this. Thanks!

ADD REPLYlink written 10 months ago by sicat.paolo2030
1

Debug the values of the array at different stages of the script. Check that the array is initialized to non-zero values, or that a pseudocount is applied correctly.

ADD REPLYlink modified 10 months ago • written 10 months ago by Alex Reynolds28k

Still not sure what's happening. I tried changing

$features[$x] /= $features[$oligos+1];

to

$features[$x] /= $features[$oligos]+1;

Thanks for the tip. The job is now running a bit further. Hopefully this works properly. Thanks

ADD REPLYlink written 10 months ago by sicat.paolo2030
1

That's not a good idea.

$features[$oligo+1]

does have a special meaning (summing all occurances). You should find out why it is zero, not replace it with something else.

ADD REPLYlink written 10 months ago by jomo018480
$features[$oligo+1]

I tried redoing this and ensuring that @features does not contain any zero values. It still gives me an illegal division by zero error message.

ADD REPLYlink written 10 months ago by sicat.paolo2030
3
gravatar for Michael Dondrup
10 months ago by
Bergen, Norway
Michael Dondrup46k wrote:

You are trying to keep the total sum of all occurrences in $features[$oligos+1] (accessing the n+1th element ) with $oligos being some integer, a global variable from out of nowhere.

Note: this part of the array @features has never been initialized as 0. It works anyway in perl, because ($undef_variable)++ == 1 plus a warning (if warnings are on). Note2: The code contains some important variables defined in global scope, therefore one needs to guess what they are meant to do.

$features[$x] /= $features[$oligos+1]; 
<=> $features[$x] = $features[$x] / $features[$oligos+1];

Now, the only way to get a division by zero is that $features[$oligos+1] == 0, which again means if ( exists( $x_mers{substr($sequence, $x, $mer)} ) ) was never True.

If this part of the code is supposed to find your kmers in 'their' kmers $x_mers and further assuming that this is the correct way of pattern matching, that would simply mean that your $sequence does not contain any of the n_mers in $x_mers and the code does not protect against this case.

#-------------------------------------------------------------------------------
# Purpose   : Builds a feature-vector for each sequence. The features are  
#             oligomer frequencies.
# Usage     : process_sequence($string, $number, *filehandle)

# Look at the required arguments, I guess you should have a $number instead


# Arguments : sequence (string), prefix (number), output (filehandle) Not sure what is meant by prefix
# Returns   : Nothing, prints.
# Globals   : Uses the global variables of the build_tetramer_hash (or pentamer) 
#             subroutines
## global %x_mers (a catalogue of some or all k-mers, with the rank of the k-mers among all k-mers?),
## $oligos (possibly the total number of all kmers in %x_mers, not needed), $mer, this is actually a rather bad idea (possibly x_mers is a huge hash)  

#*******************
sub process_sequence {
    my $sequence = uc($_[0]);
    my $prefix = shift; # The the SVM "classifier"
    my $OUT = shift;    # Filehandle
    my $mer = shift; # mer should be a parameter of the fuction
    ## 
    my $total = 0; # total number of found kmers for normalization
    my $n_oligos = scalar(keys %x_mers) ;
    # The vector of SVM features. Initially every element is zero

    my @features = (0) x $n_oligos; # does the same as the foreach

    my $iterations = length($sequence)-$mer-1;

    # just another example of non-defensive programming: what guarantees that 
    # length ($sequence) > $mer-1, what happens if iterations is negative?!?

    die "input sequence must be longer than $mer+1" unless $iterations > 0; # you could argue that the function should simply return silently in this case

    # Counts the number of occurences of each x-mer in the sequence
    # this is probably fine if the sequence is short in comparison to the
    # kmer catalogue, however there should be more efficent ways to count
    # k-mers in perl in general
    foreach my $x (0..$iterations) {
        if ( exists( $x_mers{substr($sequence, $x, $mer)} ) ) {
            $features[$x_mers{substr($sequence, $x, $mer)}]++;
           $total++;
        }
    }
    # Calculates and prints the frequencies for each oligomer
    warn "None of the $mer mers in the input sequence was found in the kmer catalogue, printing all NA frequencies" 
       if ($total < 1);

    print $OUT $prefix, "\t";
    foreach my $x (0..$n_oligos-1) { # note the -1 .. because scalar gives the number entries
        print $OUT "", ($x+1), ':', ($total)? ($features[$x]/$total):'NA', "\t"; # protects against division by zero,  possibly this output should not be used, or print 'NA' or 'inf'
    }
    print $OUT "\n";    
}
ADD COMMENTlink modified 9 months ago • written 10 months ago by Michael Dondrup46k

Hi, thanks for the help. I'll try to apply this next. Apologies if I am not good at this. I am trying to pick up this tool and trying it on my data set and the readme's suggested dataset. Both were not working so I'm trying to check if there is a fix for it. I'll try to apply this changes as soon as I can and try to understand it. Again thank you very much. Cheers

ADD REPLYlink written 10 months ago by sicat.paolo2030
1

Ok, so this is not your code? In this case (aside from flaming the author), just to get a minimal quick fix, just replace line 179 in the original script:

- $features[$x] /= $features[$oligos+1];

+ $features[$x] = ($features[$oligos+1] > 0)? $features[$x] / $features[$oligos+1] : 'NA'; # or 0

But you still should check why your sequence contains no known k-mers.

ADD REPLYlink modified 9 months ago • written 9 months ago by Michael Dondrup46k

Hi thanks to your help and the author's help (He got back to me! :D). You guys helped me what was going on. My stuff had masked sequences and that's what's causing the issue. I didn't get it at first but looking at what you said and what he said helped me dig further into the code.

ADD REPLYlink written 9 months ago by sicat.paolo2030
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1993 users visited in the last hour