Version 18 of Random Number

Updated 2004-12-13 19:42:49

Many applications require random numbers.

This includes -

  • Games of chance: You want simulated dice to behave as unpredictable as their real counterpart.
  • Monte Carlo simulations: Algorithms are tested with random input.
  • Cryptography: Random numbers make "one time pads" perfectly secure.

The deterministic nature of computers makes it impossible for computer software to come up with truly random numbers.

Hardware devices exist that produce random numbers from unpredictable, or unpredictably chaotic, physical effects such as athmospheric noise, radioactive decay, or even the movement of liquids in a common lava lamp. Some operating systems (e.g., Linux) derive reasonably random numbers from user behavior, such as the time between keyboard inputs (i.e., something like "nanoseconds between hitting two keys"), or mouse movement.

True Random Numbers downloads true random numbers from a free internet service.

Without access to such a device, the best a software can do is to use pseudo-random numbers (PRNs) which are produced by mathematical algorithms. The remainder of this page uses random number synonymous with pseudo-random number.

Because of their importance in various fields, there is a large body of literature on the subject. The canonical reference is Donald E. Knuth's The Art of Computer Programming, Volume 2: Seminumerical Algorithms.

Two of the most important properties that sequences of random numbers must, or should, satisfy are unpredictability and unbiased. The sequence of integer numbers {1, 2, 3, 4, ...} is perfectly biased but predictable. The sequence of daily outside temperature readings is unpredictable but biased, as the range of numbers is narrow, and the readings between two days are usually close. Because of such subtleties, the (probabilistic) proof that a sequence of numbers is reasonably random is very hard.

Tcl provides a pseudo-random number generator (called rand) as part of the expr command:

 expr rand()

returns a random number between 0 and 1.

Tcl's random number generator highlights a weakness that applies to all pseudo-random number generators: it has to be initialized, or "seeded." The initial seed value fully determines the sequence of numbers that is generated. If the generator is seeded with the same value over and over, it will keep producing the same random numbers. Also, by design, it never produces the same number twice in a row -- unlike dice, which can come up with the same number many times in a row without being biased.

Tcl's random number generator is as bad as many, and worse than some, in that it only accepts a single 31 bit integer as a seed value. In other words, there are only 2^31 possible random number sequences, and they repeat after a period of 2^31 (in fact, numbers never repeat in less than this period). For applications that only care about the distribution of pseudo-random numbers across the value space, such as simulated dice, that can be considered good enough. In cryptography, such a random number generator is not considered cryptographically secure. An attacker could potentially make a brute-force attempt to recover a random number sequence by checking all 2^31 seed values.

Many applications, and Tcl's random number generator (when not seeded by the application), cheat by using something like the time of day (e.g., [clock seconds]) as a seed value. However, for cryptographical applications, that is about the worst thing that can be done. Because [clock seconds] returns a number in a very narrow range, so an attacker could make a good prediction of the values that need to be tested. Using a higher resolution timer, such as [clock clicks], improves the issue by a cryptographically insigngificant margin only.

Therefore, we have a chicken and egg problem: in order to have unpredictable random numbers, a pseudo-random number generator needs to be initialized with a random seed.

We have two problems to solve, if we want to improve upon Tcl's pseudo-random number generator:

  • First, we need a better algorithm; one that allows for seed values greater than 2^32.
  • Second, we need a means of coming up with unpredictable seed values.

LES Maybe I am very naïve, but I really believe that a programmer can come up with a very good level of randomness. For example: if you prompt the user for six letters, assign an arbitrary (secret) numeric value to each letter, add the three first values and multiply the other three, then add and/or multiply everything by [clock seconds], then add or multiply it by [clock format [clock sec] -format %d], then apply some really arcane operation on the result so that only two or three digits are left, boy, isn't that random enough? Note the extreme idiosyncrasy of my formula. I can come up with ten other similarly bizarre formulas in five or ten minutes. I really don't believe that someone could crack something like that.

FPX Do not mistake obscurity for security. Your code above depends on the secrecy of the algorithm, and that is impossible to achieve. Any attacker can examine the code, so your assumption of "secret numeric values" does not hold. All the remaining transformations are reversible, and the current value of [clock seconds] is not hard to guess. So your numbers are definitely not secure. Whether they provide a reasonable source of randomness (remember, unpredictable and unbiased) should be subject of analysis. Do not assume randomness without in-depth analysis.


Two candidates for a better algorithm are

2004-Dec-10 drh: I recommend rc4. It is well-known and well studied. And it is easy to stir in new randomness obtain (for example) from %x and %y of mouse bindings or from the low-order bits of clock clicks taken after various events.

2004-Dec-13 rwm: A lot of systems just defer to the fips approved prng [L1 ] algorithms. The digital signature standard mentioned uses sha-1 as a mixing function.


KPV In trying to get a truly random number (for a seed, etc.) a good way to think is in terms of random bits. There are many traits of a computer or procedures the user can perform that can get you some degree of good randomness. If you combine enough of these, you have a good start. For example, [clock seconds] is guessable to within a few seconds thus supplying one or two bits of randomness. Other traits, such as process id, parent process id, disk free space, etc., and user tasks, such as typing or moving the mouse, all can supply few bits of good randomness. The trick is combining enough of these random bits can produce a good random number.

The scheme that FPX currently uses to compute a seed value goes as follows:

 set seed [clock clicks]
 # the user has to complete a dialog box here
 append seed [clock clicks]
 append seed [clock seconds] [pid]
 append seed [winfo id .] [winfo geometry .] [winfo pointerxy .]
 set hashedseed [binary format H* [sha1::sha1 $seed]]

The $hashseed value is then used to initialize the PRNG. (Note that the implementations of the Mersenne Twister and ISAAC algorithms mentioned above accept an arbitrary-length binary string as seed value.)

I did not fully analyze the above code for randomness yet. Each contributing value should be examined for its randomness properties. I.e., the lower and upper bounds that an attacker (local vs. remote) can estimate on each value should be analyzed, or in other words, the number of "unpredictably random" bits that each value contributes to the seed. Ideally, the sum of random bits in the seed should be at least 128, a number that is clearly not achieved by the above code.

One idea that I was tossing around was to display a sketchpad canvas, and to ask the user to draw some random figures. The pixels' coordinates could then be used to produce a seed. However, I wanted the above to be non-intrusive.

Discussion? Better ideas?


Why the fuss about cryptographically secure random numbers?

Already mentioned above, random numbers allow for perfectly secure, unbreakable cryptography, using "one-time pads." In use for decades, first on paper, now on computers, a one-time pad contains a sequence of random numbers. Only two copies of a one-time pad exist, one is with the sender of a message, and one with the recipient.

In order to encrypt a message, the sender takes each letter of the message, and transposes it with one random number: The random number, between 0 and 25, is added to the ordinal of the letter, modulo 26, to produce one letter of the encrypted message.

The receiver uses the same pad, and the inverse transformation (by subtracting the random number, modulo 26), to decrypt the message.

This scheme is theoretically unbreakable, if the numbers are perfectly random, and if each one-time pad is used only once. If a pad is used multiple times, an attacker can leverage frequency analysis to determine the contents of the pad and gain access to the messages. If the numbers are not perfectly random, but produced by a pseudo-random number generator, decryption of the message is equivalent to breaking the generator's algorithm, parameters and seed. With the algorithm and parameters usually public, only the secrecy of the seed remains, reinforcing the need for unpredictable seed values.

The problem with one-time pads is the amount of logistics involved, as the pads must be produced, and then distributed to sender and recipient via secure channels. And whoever gets his hand on the pads can produce or read messages -- like the clerk who makes a copy before forwarding the pads.


2004/12/10 sheila My latest issue of Science News has an article on random number generation [L2 ] and a visualization of a good random number generator

http://sciencenews.org/articles/20041204/a5632_2401.jpg

vs a bad one

http://sciencenews.org/articles/20041204/a5632_396.jpg

Good PRNGs are used for other purposes than just cryptography, e.g. in high energy physics simulations.


Initially written by FPX.


[Category Cryptography]