Question

Random seed in genomics analyses

0

Entering edit mode

8.4 years ago

orzech_mag ▴ 230

Dear all,

I have some concerns in the terms of the "random seed" parameter which one can set in Comparative Marker Selection at Gene Pattern. Firstly, I understand that this is the parameter related to the random number generating to produce the permutations. However, I do not understand how it works.

I. e. if one use GSEA software the "random seed" is not numerical value, but "timestamp". So, the first question is - what does mean timestamp random seed?

Second thing is, that if one performs Comparative Marker Selection the random seed is by default set to some enigmatic value like 779948241. So, the second question is - what does this value mean?

I do not understand the difference between timestamp in GSEA and 779948241 in Comparative Marker Selection. Moreover, if I i.e. change this value to 0 how does it affect my data and the results of analysis?

Unfortunately, both manuals (GSEA and Comparative Marker Selection) do not explain this parameter. Please, explain it to me or suggest any literature about that would be understandable for not statistician.

Best regards and thanks in advance for any help!

GSEA comparative-marker-selection statistics • 2.8k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.4 years ago by orzech_mag ▴ 230

Ram · Answer 1 · 2015-12-15

In short, the random seed is a number that can be used to reproduce any analysis involving random choices. For example if you run the Comparative Marker analysis with the seed 779948241 and the same data, you will get exactly the same results. If you use any other random seed, you will probably get a different output, because the random numbers used to generate the permutations or whatever random component used in the algorithms will be different.

The random seed is usually a number. In the case of GSEA, they probably convert the timestamp to a number, e.g. they convert it to Unix time. It is probably a more readable way to present a random number, e.g. it is easier to remember the current date and time at which you run an analysis, rather than a number like 779948241.

I wouldn't recommend you to set the random seed to 0, because the software may interpret this as you don't want to specify any seed, so it will use the system settings to generate one, and this will make it impossible to reproduce your analysis.

To learn more you can read about how computers generate pseudo-random sequences. In short, computers can not make up a random number like we human do; therefore software engineers can only use algorithms to generate pseudo-random numbers. These functions take a number as input (the random seed), and generate a different number or series of numbers.