tl;dr...
This questions is a generalisation of this one and the related GitHub issue:
I am looking for suggestions on what I could investigate to find out what causes differences in R function outputs when running identical code on identical input data with identical software versions on two different machine. Specifically, I am looking for general factors that are not specific to the code of the function I ran.
More specifically:
I was running the R function (sctransform::vst) which runs a variance stabilizing transformation and then reports a per-gene residual variance.
I run it on two machines. The first one a Macbook Pro with Mojave and the relevant R package installed into a local user library, and the second one a Skylake node using a Ubuntu-based Singularity image in which I installed the R packages via the renv lock file created from the Macbook user library. Version of R is the same as well. Afaict both input data, software versions and code are identical.
Still I get different outputs differing in the decimal place which have impact on downstream analysis that are based on this.
Please throw me buzzwords on what I could check and investigate to make outputs 100% identical, related to rounding and handling of decimals.
What I checked:
- I use
set.seed()
before running the function Machine$double.eps
is identicaloptions()$digits
is 7 on both machine- I set
options(scipen=999)
on both machine - I disabled BLAS and OpenMP implicit multithreading on the Linux node via RhpcBLASctl package
...and please lets not discuss whether decimal differences are important or not etc, this is not the point here ;-)
Maybe the source of the issue(s) is
Singularity
? So something like what's described in this thread.I could build the packages outside of the container directly on the remote system to check that, will try.
If you are using R, you have access the source code for the function being run (and any descendant functions called downstream), so it should be straightforward (with some digging) to check if any functions related to random sampling are not using the seed value you are manually applying.