Hello,
I am working with scanpy to analyze some single cell RNA seq data. I was wondering if I should set random.seed(0) at the beginning of my jupyter notebook. Would that keep the results reproducible? Would it cause any issues?
Thank you
Hello,
I am working with scanpy to analyze some single cell RNA seq data. I was wondering if I should set random.seed(0) at the beginning of my jupyter notebook. Would that keep the results reproducible? Would it cause any issues?
Thank you
Generally, you need fixed seeds if you want to make analysis reproducible that has a random element. I cannot speak for ScanPy and Python, but in R (towards single-cell, and generally) this could be UMAP/PCA (and most dimensionality reductions), Kmeans and some other clustering approaches, subsampling procedures and more. If there is an option to set a fixed seed then I would always do that (in fact I do). In R you set a seed before calling the function.
set.seed(1)
doSomething()
...and then the seed is wasted. Setting this once on top of your script is not enough, it will be vanished once the first function that has a random element uses it. Needs to be set before every function. Python might be different. I would check if running analysis several times give precisely same results. If not, could be a seed problem.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Looks like random seed is used in: https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.umap.html and https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.sample.html#scanpy.pp.sample You may need to set the seed for each tool though since it seems to be an option.
I realized that I have a few notebooks where I set
random.seed(0)
at the beginning of the jupyter file but not in the commands you mentioned previously. Is there any chance that this may have caused issues that it would affect the results (besides reproducibility)?