Next: Factoring and Primality Testing Up: Numerical Problems Previous: Linear Programming

Random Number Generation

Input description: Nothing, or perhaps a seed.

Problem description: Generate a sequence of random integers.

Discussion: Random number generation forms the foundation behind such standard algorithmic techniques as simulated annealing and Monte Carlo integration. Discrete event simulations, used to model everything from transportation systems to casino poker, all run on streams of random numbers. Initial passwords and cryptographic keys are typically generated randomly. New developments in randomized algorithms for graph and geometric problems are revolutionizing these fields and establishing randomization as one of the fundamental ideas of computer science.

Unfortunately, generating random numbers is a task that looks a lot easier than it really is, primarily because it is fundamentally impossible to produce truly random numbers on any deterministic device.   Von Neumann [vN63] said it best: ``Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.''   All we can hope for are pseudorandom numbers, a stream of numbers that appear as if they were generated randomly.

There can be serious consequences to using a bad random number generator. For example, the security of an Internet password scheme was recently invalidated with the discovery that its keys were produced using a random number generator of such small period that brute-force search quickly exhausted all possible passwords. The accuracy of simulations is regularly compromised or invalidated by poor random number generation.   Bottom line: This is an area where people shouldn't mess around, but they do. Issues to think about include:

• Should my program use the same ``random'' numbers each time it runs? - A poker game that deals you the exact same hand each time you play quickly loses interest. One common solution is to use the lower-order bits of the machine clock as a seed or source for random numbers, so that each time the program runs it does something different.

Such methods are perhaps adequate for games, but not for serious simulations. There are liable to be periodicities in the distribution of random numbers whenever calls are made in a loop. Also, debugging is seriously complicated by the fact that the results are not repeatable.   If the program crashes, you cannot go back and discover why. One possible compromise is to use a deterministic pseudorandom number generator, but write the current seed to a file between runs. During debugging, this file can be overwritten with a fixed initial value or seed.

• How good is my compiler's built-in random number generator? - If you need uniformly generated random numbers, and you are not going to bet the farm on the accuracy of your simulation, my recommendation is simply to use what your compiler provides.     Your best opportunity to mess it up is with a bad choice of starting seed, so read the manual for its recommendations.

If you are going to bet the farm on the quality of your simulation, you had better test your random number generator. Be aware that it is very difficult to eyeball the results and decide whether the output is really random. This is because people have very skewed ideas of how random sources should behave and often see patterns that don't really exist. To evaluate a random number generator, several different tests should be used and the statistical significance of the results established. Such tests are implemented in plab and DIEHARD (discussed below) and explained in [Knu81].

• What if I have to implement my own random number generator? - The algorithm of choice is the linear congruential generator.   It is fast, simple, and (if instantiated with the right constants) gives reasonable pseudorandom numbers. The nth random number is a function of the (n-1)st random number:

In theory, linear congruential generators work the same way roulette wheels do.   The long path of the ball around and around the wheel (captured by ) ends in one of a relatively small number of bins, the choice of which is extremely sensitive to the length of the path (captured by the truncation of the ).

A substantial theory has been developed to select the constants a, c, m, and . The period length is largely a function of the modulus m, which is typically constrained by the word length of the machine. A presumably safe choice for a 32-bit machine would be , a = 1366, c=150889, and m=714025. Don't get creative and change any of the constants, unless you use the theory or run tests to determine the quality of the resulting sequence.

• What if I don't want large, uniformly distributed random integers? - The linear congruential generator produces a uniformly distributed sequence of large integers, which can be scaled to produce other uniform distributions.   For uniformly distributed real numbers between 0 and 1, use . Note that 1 cannot be realized this way, although 0 can. If you want uniformly distributed integers between l and h, use .

Generating random numbers according to a given nonuniform distribution can be a tricky business. The most reliable way to do this correctly is the acceptance-rejection method.   Suppose we bound the desired probability distribution function or geometric region to sample from a box and then select a random point p from the box. This point can be selected by p by generating the x and y coordinates independently, at random. If this p is within the distribution, or region, we can return p as selected at random. If p is in the portion of the box outside the region of interest, we throw it away and repeat with another random point. Essentially, we throw darts at random and report those that hit the target.

This method is correct, but it can be slow. If the volume of the region of interest is small relative to that of the box, most of our darts will miss the target.   Efficient generators for Gaussian and other special distributions are described in the references and implementations below.

Be cautious about inventing your own technique, however, since it can be tricky to obtain the right probability distribution. For example, an incorrect way to select points uniformly from a circle of radius r would be to generate polar coordinates and select an angle from 0 to and a displacement between 0 and r, both uniformly at random.   In such a scheme, half the generated points will lie within r/2 of the radius, when only one-fourth of them should be! This is a substantial enough difference to seriously skew the results, while being subtle enough that it might escape detection.

• How long should I run my Monte Carlo simulation to get the best results? - It makes sense that the longer you run a simulation, the more accurately the results will approximate the limiting distribution, thus increasing accuracy. However, this is only true until you exceed the period, or cycle length, of your random number generator. At that point, your sequence of random numbers repeats itself, and further runs generate no additional information. Check the period length of your generator before you jack up the length of your simulation. You are liable to be very surprised by what you learn.

Implementations: An excellent WWW page on random number generation and stochastic simulation is available at http://random.mat.sbg.ac.at/others/. It includes pointers to papers and literally dozens of implementations of random number generators. From there are accessible pLab [Lee94] and DIEHARD, systems for testing the quality of random number generators.

The Stanford Graphbase (see Section ) contains a machine-independent random number generator based on the recurrence . With the proper initialization, this generator has a period of at least .

Algorithm 488 [Bre74], Algorithm 599 [AKD83], and Algorithm 712 [Lev92] of the Collected Algorithms of the ACM are Fortran codes for generating random numbers according to several probability distributions, including normal, exponential, and Poisson distributions. They are available from Netlib (see Section ).

Sim++ is a library of routines for implementing discrete event simulations, built by Robert Cubert and Paul Fishwick, of the University of Florida. It contains random number generators for a variety of different distributions, including uniform, exponential, and normal. Check out http://www.cis.ufl.edu/ fishwick/simpack/simpack.html if you need a random number generator to control a simulation. Fishwick's book [Fis95] describes model design using SimPack.

LEDA (see Section ) provides a comprehensive random source in C++ for generating random bits, integers, and double precision reals.   Sedgewick [Sed92] provides simple implementations of linear and additive congruential generators in C++. See Section for details.

XTango (see Section ) is an algorithm animation system for UNIX and X-windows, which includes an animation illustrating the uniformity of random number generation.

Notes: Knuth [Knu81] has a thorough and interesting discussion of random number generation, which I heartily recommend. He presents the theory behind several methods, including the middle square and shift-register methods we have not described here, as well as a detailed discussion of statistical tests for validating random number generators. Another good source is [PFTV86] - our recommended constants for the linear congruential generator are drawn from here. Comparisons of different random number generators in practice include [PM88].

Tables of random numbers appear in most mathematical handbooks, as relics from the days before there was ready access to computers. Most notable is [RC55], which provides one million random digits.

The deep relationship between randomness, information, and compressibility is explored within the theory of Kolmogorov complexity, which measures the complexity of a string by its compressibility. Truly random strings are incompressible. The string of seemingly random digits of cannot be random under this definition, since the entire sequence is defined by any program implementing a series expansion for . Li and Vitáni [LV93] provide a thorough introduction to the theory of Kolmogorov complexity.

Related Problems: Constrained and unconstrained optimization (see page ), generating permutations (see page ), generating subsets (see page ), generating partitions (see page ).

Next: Factoring and Primality Testing Up: Numerical Problems Previous: Linear Programming

Algorithms
Mon Jun 2 23:33:50 EDT 1997