Next: Knapsack Problem Up: Numerical Problems Previous: Factoring and Primality Testing

## Arbitrary-Precision Arithmetic

Input description: Two very large integers, x and y.

Problem description: What is x+y, x-y, , and x / y?

Discussion: Any programming language whose level rises above basic assembler supports single- and perhaps double-precision integer/real addition, subtraction, multiplication, and division. But what if we wanted to represent the national debt of the United States in pennies? One trillion dollars worth of pennies requires 15 decimal digits, which is far more than can fit into a 32-bit integer.

In other applications much larger integers are needed.     The RSA algorithm for public-key cryptography requires integer keys of at least 100 digits to achieve any level of security, and 1000 digits are recommended.   Experimenting with number-theoretic conjectures for fun or research always requires playing with large numbers. I once solved a minor open problem [GKP89] by performing an exact computation on the integer .

What should you do when you need large integers?

• Am I solving a problem instance requiring large integers, or do I have an embedded application? - If you just need the answer to a specific problem with large integers, such as in the number theory application above, you would be well advised to consider using a computer algebra system like Maple or Mathematica.       These use arbitrary-precision arithmetic as a default and use nice Lisp-like programming languages as a front end, together often reducing your problem to a 5 to 10 line program.

If instead you have an embedded application requiring high-precision arithmetic, you would be well advised to use an existing library. In addition to the four basic operations, you are likely to get additional functions for computing things like greatest common divisor in the bargain.   See the implementations below for details.

• Do I need high- or arbitrary-precision arithmetic? - Is there an upper bound on how big your integers can get, or do you really need arbitrary-precision, i.e. unbounded. This determines whether you can use a fixed-length array to represent your integers as opposed to a linked-list of digits. The array is likely to be simpler and will not be a constraint in most applications.
• What base should I do arithmetic in? -   It is conceptually simplest to implement your own high-precision arithmetic package in decimal and represent each integer as a string of base-10 digits, at one digit per node.   However, it is far more efficient to use a higher base, ideally the square root of the largest integer supported fully by hardware arithmetic.

Why? The higher the base, the fewer digits we need to represent the number (compare 64 decimal with 1000000 binary). Since hardware addition usually takes one clock cycle independent of the actual numbers, best performance is achieved using the highest base.   The reason for limiting us to is that in performing high-precision multiplication, we will multiply two of these ``digits'' together and need to avoid overflow.

The only complication of using a larger base is that integers must be converted to and from base-10 for input and output, but the conversion is easily performed once all four high-precision arithmetical operations are supported.

• How low-level are you willing to get for fast computation? Hardware addition is much faster than a subroutine call, so you are going to take a significant hit on speed whenever your package is used where low-precision arithmetic suffices.   High-precision arithmetic is one of few problems in this book where inner loops in assembly language can be the right idea to speed things up.   Finally, using bit-level masking and shift operations instead of arithmetical operations can be a win.

The algorithm of choice for each of the five basic arithmetic operations is as follows:

• Addition - The basic schoolhouse method of lining up the decimal points and then adding the digits from right to left with carries works in time linear in the number of digits.   More sophisticated carry-look-ahead parallel algorithms are available for low-level hardware implementation. Hopefully they are used on your chip for low-precision addition.
• Subtraction - Depending upon the sign bits of the numbers, subtraction can be a special case of addition: (A - (-B)) = (A+B). The tricky part of subtraction is performing the borrow.   This can be simplified by always subtracting from the number with the larger absolute value and adjusting the signs afterwards, so we can be certain there will always be something to borrow from.
• Multiplication - The simplest method of repeated addition will take exponential time on large integers, so stay away. The digit-by-digit schoolhouse method is reasonable to program and will work much better, presumably well enough for your application.   On very large integers, Karatsuba's divide-and-conquer algorithm (cited in the notes) wins. Dan Grayson, author of Mathematica's arbitrary-precision arithmetic, found that the switch-over happened at well under 100 digits.   Even faster on very large integers is an algorithm based on Fourier transforms. A discussion of such algorithms appears in Section .
• Division - Repeated subtraction will take exponential time, so the easiest reasonable algorithm to use is the long-division method you hated in school.   This is a far more complicated algorithm than needed for the other operations, requiring arbitrary-precision multiplication and subtraction as subroutines, as well as trial and error to determine the correct digit to use at each position of the quotient.

In fact, integer division can be reduced to integer multiplication, although in a nontrivial way, so if you are implementing asymptotically fast multiplication, you can reuse that effort in long division. See the references below for details.

• Exponentiation - We can compute in the obvious manner using b-1 multiplications, but a better way is to exploit the fact that . By repeatedly squaring the results of our partial product, we can escape using multiplications, a big win when b is large.     See Section for a discussion of this algorithm.

High- but not arbitrary-precision arithmetic can be conveniently performed using the Chinese remainder theorem and modular arithmetic.     The Chinese remainder theorem states that an integer between 1 and is uniquely determined by its set of residues mod , where each are relatively prime integers. Addition, subtraction, and multiplication (but not division) can be supported using such residue systems, with the advantage that large integers can be manipulated without complicated data structures.

Many of these algorithms for computations on long integers can be directly applied to computations on polynomials.   See the references for more details. A particularly useful algorithm is Horner's rule for fast polynomial evaluation.   When is blindly evaluated term by term, multiplications will be performed. Much better is observing that , the evaluation of which uses only a linear number of operations.

Implementations: All major commercial computer algebra systems incorporate high-precision arithmetic, including Maple, Mathematica, Axiom, and Macsyma.     If you have access to one of these, this is your best option for a quick, nonembedded application. The rest of this section focuses on source code available for embedded applications.

PARI, developed by Henri Cohen and his colleagues in France, is a system capable of handling complex number-theoretic problems on integers with up to 300,000 decimal digits, as well as reals, rationals, complex numbers, polynomials, and matrices.   It is probably the most powerful free software available for number theory. Written mainly in C (with assembly-language code for speed-critical routines), it includes more than 200 special predefined mathematical functions.   PARI can be used as a library, but it possesses also a powerful calculator mode that gives instant access to all the types and functions.   The main advantage of PARI is its speed. On a Unix platform, it runs between 5 to 100 times faster than Maple or Mathematica, depending on the applications. PARI is available for PC, Amiga, Macintosh, and most Unix platforms by anonymous ftp at ftp://megrez.ceremab.u-bordeaux.fr/pub/pari/.

Algorithm 693 [Smi91] of the Collected Algorithms of the ACM is a Fortran implementation of floating-point, multiple-precision arithmetic. See Section .

An implementation of arbitrary-precision integer and rational arithmetic in C++ is embedded in LEDA (see Section ), including GCD, square roots, and logarithms as well as the basic four operations.       Sparc assembler code is used for certain time-critical functions.

Implementations in C of a high-precision calculator with all four elementary operations appear in [BR95]. The authors use base 10 for arithmetic and arrays of digits to represent long integers, with short integers as array indices, thus limiting computations to 32,768 digits. The code for these algorithms is printed in the text and available on disk for a modest fee.

Bare bones implementations in C of high-precision multiplication and in Pascal of such special functions as logarithm and arctangent appear in [GBY91].   See Section for further details. Sedgewick [Sed92] provides a bare bones implementation of polynomial arithmetic in C++.   See Section for details.

Notes: Knuth [Knu81] is the primary reference on algorithms for all basic arithmetic operations, including implementations of them in the MIX assembly language.   Bach and Shallit [BS96] provide a more recent treatment of computational number theory.

Expositions on the -time divide-and-conquer algorithm for multiplication [KO63] include [AHU74, Man89].   An FFT-based algorithm multiplies two n-bit numbers in time and is due to Schönhage and Strassen [SS71]. Expositions include [AHU74, Knu81]. The reduction between integer division and multiplication is presented in [AHU74, Knu81].

Good expositions of algorithms for modular arithmetic and the Chinese remainder theorem include [AHU74, CLR90]. A good exposition of circuit-level algorithms for elementary arithmetic algorithms is [CLR90].

Euclid's algorithm for computing the greatest common divisor of two numbers is perhaps the oldest interesting algorithm. Expositions include [CLR90, Man89].

Related Problems: Factoring integers (see page ), cryptography (see page ).

Next: Knapsack Problem Up: Numerical Problems Previous: Factoring and Primality Testing

Algorithms
Mon Jun 2 23:33:50 EDT 1997