Input description: An integer n.
Problem description: Is n a prime number, and if not what are the factors of n?
Discussion: The dual problems of factoring integers and testing primality have surprisingly many applications for a problem long suspected of being only of mathematical interest. The security of the RSA public-key cryptography system (see Section ) is based on the computational intractability of factoring large integers. As a more modest application, hash table performance typically improves when the table size is a prime number. To get this benefit, an initialization routine must identify a prime near the desired table size. Finally, prime numbers are just interesting to play with. It is no coincidence that programs to generate large primes often reside in the games directory of UNIX systems.
Although factoring and primality testing are related problems, algorithmically they are quite different. There exist algorithms that can demonstrate that an integer is composite (i.e. not prime) without actually giving the factors. To convince yourself of the plausibility of this, note that you can demonstrate the compositeness of any nontrivial integer whose last digit is 0, 2, 4, 5, 6, or 8 without doing the actual division.
The simplest algorithm for both of these problems is brute-force trial division. To factor n, compute the remainder of n/i for all . The prime factorization of n will contain at least one instance of every i such that , unless n is prime. Make sure you handle the multiplicities correctly and account for any primes larger than .
Such algorithms can be sped up by using a precomputed table of small primes to avoid testing all possible i. Surprisingly large numbers of primes can be represented in surprisingly little space by using bit vectors (see Section ). A bit vector of all odd numbers less than 1,000,000 fits in under 64 kilobytes. Even tighter encodings become possible by eliminating all multiples of three and other small primes.
Considerably faster factoring algorithms exist, whose correctness depends upon more substantial number theory. The fastest known algorithm, the number field sieve, uses randomness to construct a system of congruences, the solution of which usually gives a factor of the integer. Integers with as many at 128 digits have been factored using this method, although such feats require enormous amounts of computation.
Randomized algorithms make it much easier to test whether an integer is prime. Fermat's little theorem states that for all a, when n is prime. Suppose we pick a random value of and compute the residue of . If this residue is not 1, we have just proven that n cannot be prime. Such randomized primality tests are very efficient. PGP (see Section ) finds 300+ digit primes using hundreds of these tests in minutes, for use as cryptographic keys.
Although the primes are scattered in a seemingly random way throughout the integers, there is some regularity to their distribution. The prime number theorem states that the number of primes less than n, commonly denoted by , is approximately . Further, there are never large gaps between primes, so in general, one would expect to examine about integers if one wanted to find the first prime larger than n. This distribution and the fast randomized primality test explain how PGP can find such large primes so quickly.
Implementations: My first choice for factoring or primality testing applications would be PARI, a system capable of handling complex number-theoretic problems on integers with up to 300,000 decimal digits, as well as reals, rationals, complex numbers, polynomials, and matrices. It is written mainly in C, with assembly code for inner loops on major architectures, and includes more than 200 special predefined mathematical functions. PARI can be used as a library, but it also possesses a calculator mode that gives instant access to all the types and functions. The main advantage of PARI is its speed. On a Unix platform, it is between 5 to 100 times faster than Maple or Mathematica, depending on the applications. PARI is available for PC, Amiga, Macintosh, and most Unix platforms by anonymous ftp at ftp://megrez.ceremab.u-bordeaux.fr/pub/pari/.
A Mathematica implementation by Ilan Vardi of Lenstra's elliptic curve method of factorization is available in Packages/NumberTheory/FactorIntegerECM.m of the standard Mathematica distribution and MathSource. It is designed to find prime factors of up to about 18 digits in reasonable time, extending Mathematica's ability to factor all numbers of up to 40 digits. It is faster when factoring the product of small primes.
Notes: Bach and Shallit's book [BS96] is the most comprehensive reference on computational number theory, while Adleman's excellent survey [Adl94a] describes the state of the art, as well as open problems. Good expositions on modern algorithms for factoring and primality testing include [CLR90].
The Miller-Rabin [Mil76, Rab80] randomized primality testing algorithm eliminates problems with Carmichael numbers, which are composite integers that always satisfy Fermat's theorem. The best algorithms for integer factorization include the quadratic-sieve [Pom84] and the elliptic-curve methods [HWL87].
Mechanical sieving devices provided the fastest way to factor integers surprisingly far into the computing era. See [SWM95] for a fascinating account of one such device, built during World War I. Hand-cranked, it proved the primality of in fifteen minutes of sieving time.
An important problem in computational complexity theory is whether P = NP co-NP. The decision problem ``is n a composite number?'' is perhaps the best candidate for a counterexample. By exhibiting the factors of n, it is trivially in NP. It can be shown to be in co-NP, since every prime has a short proof of its primality [Pra75]. However, there is no evidence it is in P. For more information on complexity classes, see [GJ79, Joh90].
A group headed by Arjen Lenstra has regularly broken records for general-purpose integer factoring, using an Internet-distributed implementation of the quadratic sieve factoring method. The June 1993 factorization of RSA-120 took approximately 830 MIP-years of computation. The April 1994 factorization of RSA-129, famous for appearing in the original RSA paper [RSA78], was factored in eight months using over 1,600 computers. This was particularly noteworthy because in [RSA78] they had originally predicted such a factorization would take 40 quadrillion years using 1970s technology.
Related Problems: Cryptography (see page ), high precision arithmetic (see page ).