Question: Random seed in genomics analyses
0
gravatar for orzech_mag
3.4 years ago by
orzech_mag190
Poland/Łódź
orzech_mag190 wrote:

Dear all,

I have some concerns in the terms of the "random seed" parameter which one can set in Comparative Marker Selection at Gene Pattern. Firstly, I understand that this is the parameter related to the random number generating to produce the permutations. However, I do not understand how it works.

I. e. if one use GSEA software the "random seed" is not numerical value, but "timestamp". So, the first question is - what does mean timestamp random seed?

Second thing is, that if one performs Comparative Marker Selection the random seed is by default set to some enigmatic value like 779948241. So, the second question is - what does this value mean?

I do not understand the difference between timestamp in GSEA and 779948241 in Comparative Marker Selection. Moreover, if I i.e. change this value to 0 how does it affect my data and the results of analysis?

Unfortunately, both manuals (GSEA and Comparative Marker Selection) do not explain this parameter. Please, explain it to me or suggest any literature about that would be understandable for not statistician.

Best regards and thanks in advance for any help!

ADD COMMENTlink modified 3.4 years ago by Giovanni M Dall'Olio26k • written 3.4 years ago by orzech_mag190
1
gravatar for Giovanni M Dall'Olio
3.4 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

In short, the random seed is a number that can be used to reproduce any analysis involving random choices. For example if you run the Comparative Marker analysis with the seed 779948241 and the same data, you will get exactly the same results. If you use any other random seed, you will probably get a different output, because the random numbers used to generate the permutations or whatever random component used in the algorithms will be different.

The random seed is usually a number. In the case of GSEA, they probably convert the timestamp to a number, e.g. they convert it to Unix time. It is probably a more readable way to present a random number, e.g. it is easier to remember the current date and time at which you run an analysis, rather than a number like 779948241.

I wouldn't recommend you to set the random seed to 0, because the software may interpret this as you don't want to specify any seed, so it will use the system settings to generate one, and this will make it impossible to reproduce your analysis.

To learn more you can read about how computers generate pseudo-random sequences. In short, computers can not make up a random number like we human do; therefore software engineers can only use algorithms to generate pseudo-random numbers. These functions take a number as input (the random seed), and generate a different number or series of numbers.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Giovanni M Dall'Olio26k
1

That's a good answer, here are some considerations about using the timestamp as seed :

Another advantage of the timestamp is that, every time you run the program, the seed (and the output) will be different. On the opposite, if you set manually the seed at 779948241 (or 0, or whatever), the program will use the same random numbers and will produce the same results from identical input. Sometimes it is useful to be able to reproduce exactly the same results, but usually, its better to vary the seed to make sure that your result is robust.

ADD REPLYlink written 3.4 years ago by Carlo Yague4.4k

OK. Now I get it. The default seed value in Comparative Marker Selection (779948241) I leave only for my curiosity, however it is very nice to understand the basics. Generally, the problem with bioinformatic analyses lays in the parameters and simply, in mathematics.

Thank you very much for help!

ADD REPLYlink written 3.4 years ago by orzech_mag190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1693 users visited in the last hour