$path_scan_pval goes negative causing PathScan to crash..
Maybe an integer underflow error due to architecture differences???
I have modified the source code:
#__COMPUTE ACTUAL TAILED P-VALUE IF WE'RE WITHIN TRUNCATION RANGE
if ($k <= $hits) {
my $var = $path_scan_pval - $obj->p_value_binomial_approx ($k-1);
if($var > 0)
{
$path_scan_pval = $var;
}
unshift @{$cdf}, $path_scan_pval;
#__OTHERWISE JUST INSERT A FLAG TO FUNCTION AS A PLACEHOLDER FOR OTHER
#__METHODS THAT EXPECT THE *FORM* OF THE LIST TO BE A FULL CDF
} else {
unshift @{$cdf}, -1;
}
}
Unfortunately, this is a known error, which we could not fix because it's inherent to Perl's troubles with floating point math. The error only appears with certain combinations of input, and we can't pin down a pattern. Try tweaking the precision of the input BMR, to see if you can avoid the error. For example, if your BMR is 1.32E-6, try running with 1.3E-6. Having fewer pathways to test (in your pathway-file) also reduces the odds for this error... :-|
One solution we tried was to use Math::BigInt to wrap all floating point ops, but that made the runtime impractical. We've tried several other workarounds including the one you use in the OP, where negative p-vals are just skipped. The trouble is that you're skipping values that are probably pretty significant.
Sorry, we can't offer a fix. We want to rewrite the core math of PathScan in R. But it may be a while till we get a chance to work on that.
I came up with the follwing modified code of PopulationPathScan.pm (line 589) to circumvent this problem. This modification looks for the next non-negative p-value down the list (from lower p-values to larger ones) and takes it as a substitute for the negative one. I am using MuSiC version 0.0401 on Ubuntu 12.04 64-bit.
#__TAIL PVALS FOR THIS HIT NUMBER (SEE PROGRAMMING NOTE ABOVE)
my $pval_x = $obj->{'cdf'}->[$obj->{'num_genes'} - $hits];
# 2013-05-23 | CF | perl floating point math bug (see BioStar discussion 'MuSIC path-scan pvalue error');
# note that this quick fix works for now but it decreases the significance of hits; proper fix required
if ($pval_x < 0)
{
my $old_pval = $pval_x;
for (my $i = $obj->{'num_genes'} - $hits + 1; $i < @{$obj->{'cdf'}} and $pval_x < 0; $i ++)
{
$pval_x = $obj->{'cdf'}->[$i];
}
$pval_x = 1 if ($pval_x < 0);
print STDERR "WARNING: P-value negative: hit number = $hits; number of genes = ",$obj->{'num_genes'},"; pval_x = $old_pval; corrected to pval_x = $pval_x\n";
}
# my $pval_x_m_1 = $obj->{'cdf'}->[$obj->{'num_genes'} - $hits - 1];
my $pval_x_m_1;
my $x_m_1_index = $obj->{'num_genes'} - $hits - 1;
if ($x_m_1_index >= 0) {
$pval_x_m_1 = $obj->{'cdf'}->[$x_m_1_index];
} else {
$pval_x_m_1 = 0; # dont allow this to inadvertently loop to list end
}
# 2013-05-23 | CF | perl floating point math bug (see BioStar discussion 'MuSIC path-scan pvalue error');
# note that this quick fix works for now but it decreases the significance of hits; proper fix required
if ($pval_x_m_1 < 0)
{
my $old_pval = $pval_x_m_1;
for (my $i = $obj->{'num_genes'} - $hits; $i < @{$obj->{'cdf'}} and $pval_x_m_1 < 0; $i ++)
{
$pval_x_m_1 = $obj->{'cdf'}->[$i];
}
$pval_x_m_1 = 1 if ($pval_x_m_1 < 0);
print STDERR "WARNING: P-value negative: hit number = $hits; number of genes = ",$obj->{'num_genes'},"; pval_x_m_1 = $old_pval; corrected to pval_x_m_1 = $pval_x_m_1\n";
}
Note that this solution is not perfect, because it decreases the significance of enriched pathways, but at least the script no longer crashes. It's also worth mentioning that I observed this problem of negative p-values only for very large pathways (>2000 genes).