Question

Random seed in scanpy

0

Entering edit mode

5 months ago

bioinfo ▴ 160

Hello,

I am working with scanpy to analyze some single cell RNA seq data. I was wondering if I should set random.seed(0) at the beginning of my jupyter notebook. Would that keep the results reproducible? Would it cause any issues?

Thank you

scanpy scRNAseq single-cell • 1.1k views

ADD COMMENT • link updated 5 months ago by ATpoint 90k • written 5 months ago by bioinfo ▴ 160

0

Entering edit mode

Looks like random seed is used in: https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.umap.html and https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.sample.html#scanpy.pp.sample You may need to set the seed for each tool though since it seems to be an option.

ADD REPLY • link 5 months ago by GenoMax 154k

0

Entering edit mode

I realized that I have a few notebooks where I set random.seed(0) at the beginning of the jupyter file but not in the commands you mentioned previously. Is there any chance that this may have caused issues that it would affect the results (besides reproducibility)?

ADD REPLY • link 5 months ago by bioinfo ▴ 160

score 0 · Answer 1 · 2025-06-04

Generally, you need fixed seeds if you want to make analysis reproducible that has a random element. I cannot speak for ScanPy and Python, but in R (towards single-cell, and generally) this could be UMAP/PCA (and most dimensionality reductions), Kmeans and some other clustering approaches, subsampling procedures and more. If there is an option to set a fixed seed then I would always do that (in fact I do). In R you set a seed before calling the function.

set.seed(1)
doSomething()

...and then the seed is wasted. Setting this once on top of your script is not enough, it will be vanished once the first function that has a random element uses it. Needs to be set before every function. Python might be different. I would check if running analysis several times give precisely same results. If not, could be a seed problem.