Music Path-Scan Pvalue Error
3
2
Entering edit mode
12.1 years ago

Hi guys,

I also found an error in Music Path-Scan

PathScan.pm around line 462

$path_scan_pval goes negative causing PathScan to crash..

Maybe an integer underflow error due to architecture differences???

I have modified the source code:

#__COMPUTE ACTUAL TAILED P-VALUE IF WE'RE WITHIN TRUNCATION RANGE
  if ($k <= $hits) {
    my $var = $path_scan_pval - $obj->p_value_binomial_approx ($k-1);
    if($var > 0)
    {
            $path_scan_pval = $var;
    }
    unshift @{$cdf}, $path_scan_pval;
#__OTHERWISE JUST INSERT A FLAG TO FUNCTION AS A PLACEHOLDER FOR OTHER
#__METHODS THAT EXPECT THE *FORM* OF THE LIST TO BE A FULL CDF
  } else {
     unshift @{$cdf}, -1;
  }
}

Any feed back on this would be very appreciated.

Cheers,

music • 2.5k views
ADD COMMENT
1
Entering edit mode
12.1 years ago

Unfortunately, this is a known error, which we could not fix because it's inherent to Perl's troubles with floating point math. The error only appears with certain combinations of input, and we can't pin down a pattern. Try tweaking the precision of the input BMR, to see if you can avoid the error. For example, if your BMR is 1.32E-6, try running with 1.3E-6. Having fewer pathways to test (in your pathway-file) also reduces the odds for this error... :-|

One solution we tried was to use Math::BigInt to wrap all floating point ops, but that made the runtime impractical. We've tried several other workarounds including the one you use in the OP, where negative p-vals are just skipped. The trouble is that you're skipping values that are probably pretty significant.

Sorry, we can't offer a fix. We want to rewrite the core math of PathScan in R. But it may be a while till we get a chance to work on that.

ADD COMMENT
1
Entering edit mode
11.4 years ago
Christian ★ 3.1k

I came up with the follwing modified code of PopulationPathScan.pm (line 589) to circumvent this problem. This modification looks for the next non-negative p-value down the list (from lower p-values to larger ones) and takes it as a substitute for the negative one. I am using MuSiC version 0.0401 on Ubuntu 12.04 64-bit.

      #__TAIL PVALS FOR THIS HIT NUMBER (SEE PROGRAMMING NOTE ABOVE)
         my $pval_x = $obj->{'cdf'}->[$obj->{'num_genes'} - $hits];

         # 2013-05-23 | CF | perl floating point math bug (see BioStar discussion 'MuSIC path-scan pvalue error'); 
         # note that this quick fix works for now but it decreases the significance of hits; proper fix required
         if ($pval_x < 0) 
         {
             my $old_pval = $pval_x;
            for (my $i = $obj->{'num_genes'} - $hits + 1; $i < @{$obj->{'cdf'}} and $pval_x < 0; $i ++)
            {
                $pval_x = $obj->{'cdf'}->[$i];
            }
            $pval_x = 1 if ($pval_x < 0);
            print STDERR "WARNING: P-value negative: hit number = $hits; number of genes = ",$obj->{'num_genes'},"; pval_x = $old_pval; corrected to pval_x = $pval_x\n";
         }

#        my $pval_x_m_1 = $obj->{'cdf'}->[$obj->{'num_genes'} - $hits - 1];
         my $pval_x_m_1;
         my $x_m_1_index = $obj->{'num_genes'} - $hits - 1;
         if ($x_m_1_index >= 0) {
            $pval_x_m_1 = $obj->{'cdf'}->[$x_m_1_index];
         } else {
            $pval_x_m_1 = 0; # dont allow this to inadvertently loop to list end
         }

         # 2013-05-23 | CF | perl floating point math bug (see BioStar discussion 'MuSIC path-scan pvalue error'); 
         # note that this quick fix works for now but it decreases the significance of hits; proper fix required
         if ($pval_x_m_1 < 0) 
         {
             my $old_pval = $pval_x_m_1;
            for (my $i = $obj->{'num_genes'} - $hits; $i < @{$obj->{'cdf'}} and $pval_x_m_1 < 0; $i ++)
            {
                $pval_x_m_1 = $obj->{'cdf'}->[$i];
            }
            $pval_x_m_1 = 1 if ($pval_x_m_1 < 0);
            print STDERR "WARNING: P-value negative: hit number = $hits; number of genes = ",$obj->{'num_genes'},"; pval_x_m_1 = $old_pval; corrected to pval_x_m_1 = $pval_x_m_1\n";
         }

Note that this solution is not perfect, because it decreases the significance of enriched pathways, but at least the script no longer crashes. It's also worth mentioning that I observed this problem of negative p-values only for very large pathways (>2000 genes).

ADD COMMENT
0
Entering edit mode
12.1 years ago

thank you for the feedback..

ADD COMMENT

Login before adding your answer.

Traffic: 1697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6