Hello,
I have two custom databases one with a single phage and one with that single phage and a bacteria. I have done two separate kraken2 runs on the same data but got different amounts of that phage classified each time. This was on the scale of 200 with just phage to 21 to with phage and bacteria.
Why is kraken2 not deterministic? is this a normal occurrence?
I have seen that specific settings for --minimum-hit-groups 4 and --confidence 0.05 may improve the consistency of output. However, I am having a hard time comprehending how the algorithm would work and not give consistent results. Could someone explain this?
Thanks!
Many (most?) programs related to NGS data analysis produce non-deterministic output (unless they explicitly offer an option to produce deterministic results with a way to provide a seed or an explicit option to ask for deterministic output). In general this is because of use of parallel processing, stochastic algorithms, data handling (multiple I/O streams) and differences in hardware/software (not in your case).
Thank you!