Umu logo Umeå universitet
Teknisk-naturvetenskaplig fakultet
Institutionen för datavetenskap


Laboration 1: Large Numbers

Delivery Date

Monday 02.02.04

Programming Language

By now you should know how to program, and how to do this effciently and decently. If not, then you should do DOA again. In this lecture you are free to choose any programming language that suits the purpose. Suggested languages are C, Java or C++ (in my personal decreasing order of preference). Considering the type of task an object-oriented language may be preferable over plain C.

Part of Larger Project

The ultimate goal of the labs is building an public-key crypto system. This requires finding large primes. An easy way to obtain large primes is to generate a large random number x and to search for the first prime number p >= x (well, this is not very efficient, because only a small fraction of the numbers is prime, but we will address this question later). This requires primality testing, which essentially requires computing powers and modulo. But before we can even start thinking in this direction we must have routines for handling very large numbers. There will be four labs, each of them dealing with a small subtask (i will make the exercises and check that they do not cost me more than 8 hours of programming time): An important issue in all labs is testing the correctness. How can you be sure that your multiplication, division and primality testing routines indeed do what you expect them to do? In order to reveal most bigger bugs, i prescribe some elementary tests. These will not find minor errors that occur only for special cases (such as when dealing with numbers of the form 2^x or 2^x - 1). It appears hard to guarantee correctness without applying formal certificates (which would be a terrific task).

To Do in First Lab

In this first laboration you should write all basic procedures: addition, subtraction, multiplication and what is further needed to handle the numbers. The maximum number B of bits is a parameter. You may assume it is a multiple of b, where b is the number of bits that you are packing in an unsigned int, assuming that your super long integers are stored into an array of unsigned ints. Before you start to program, you should decide which b you are going to use. Be careful with unsigned ints, especially if you are using them as a loop index, strange things happen if they become "negative". Be aware that you are going to use these routines again in later labs, so make it extendable! In our further applications there is no need for negative numbers, so there is no need for a sign bit (though it certainly makes everything more versatile).

"Wise" suggestions from my own experience. I choose to assume that the two arguments of the binary operations have the same length. This is easy and makes sense considering that we want to apply the recursive multiplication algorithm. It is somewhat wasteful though. I also prescribed the size of the result. This is unnecessary. It appears to be better to return an answer of the size required. When multiplying numbers of different size, one could round up to the maximum. Whatever you assume, it is desirable that before you start a multiplication or division (next assignmnet) you determine the actual number of non-zero digits and operate only on those. This is even possible for the recursive multiplication: (u * 2^t + v) * (x * 2^t + y) = u * x * 2^{2 * t} + v * y + ((u + v) * (x + y) - u * x - v * y) * 2^t. t should be chosen so that the length of the longest number is divided by 2.

For all functions (assignment, comparison, arithmetic) you will need versions with one long argument and one unsigened int, and a version with two long arguments (for the exponentiation you will even need 3 versions). It is convenient to have them named xxx_short and xxx_long.

You should be aware that the only operations that matter for the total time consumption are those that have more than linear time complexity. A few extra copying operations at the beginning or end of a routine do not matter. It is much more important to make sure that the routines are correct, than to save linear time. In this context i want to warn you for operations like x = x * y. Here x appears twice, and you should make sure that the argument x is not destroyed before the computation is complete.

Of course you will also need some printing routines. In the first place you must write down some primes in lab 3, but also for debugging you will need these. Input routines may be nice to have, but i did not need them. If you implement them, the most convenient would be to read the digits into an array of char, and then to pack them together, using multiplication by 10 and addition.

Addition

Adding two of these numbers is easy: add corresponding ints, take care of the carry, moving from least significant to most significant. Exit with failure notice upon overflow.

Subtraction

Similar to addition, moving from least significant to most significant. If x_i - y_i < z_i, then subtract 1 from x_{i+1} and add 2^b to z_i. Exit with failure notice when result is negative.

Multiplication

Investigating this is the actual goal of the first laboration. You should program one of the conventional multiplication algorithms and the one with sub-quadratic running time. Exit with failure notice upon overflow.

Other Routines

Generating a random number, increase/decrease/multiplication by a b-bit number, assignment of b-bit number, general assignment, test on zero, ==, >, <, >=, <=. For the random number, the requirements are not too high: it is only required that you generate a number y in the range from 0 <= y <= x, for a specified long number x. If x has k non-zero digits, then this can be done by generating 0 <= y[k - 1] <= x[k - 1], and 0 <= y[j] < 2^bits for all 0 <= j < k - 1.

Requirement

You must exploit that a processor can handle 32 bits (actually this should be a parameter) at the same time (this does not mean that the above mentioned parameter b must be 32, but it must be larger than 1). You must supply a test routine which is doing the following: (1) generate four random super long numbers: x1, x2, y1 and y2. (2) test, for both multiplication routines, whether (x1 + x2) * (y1 + y2) - (x1 * y1 + x1 * y2 + x2 * y1 + x2 * y2) == 0.

Once everything is running and tested, you should optimize the choice of the threshold parameter at which the subquadratic multiplication algorithm is switching to the conventional algorithm.

Deliverables on Paper

Meassure for B = 32, 64, 128, ..., as far as you come in one hour, the time consumption for the product of two numbers of B bits using both multiplication procedures.

Plot all measured results on one page of A4 with a scaling that clearly shows the development. I suggest that you use gnuplot for this (type gnuplot and then help), but even a drawing by hand is acceptable.

Give formulas of the following form

T(B) = a0 + a1 * B + a2 * B^c
that describe the time consumption as good as possible. Use some trivial trial and error method to find the parameters or use regression. For c you should take the theoretical value: c = 2 for the trivial algorithm and c = log_2 3 for the sophisticated algorithm.

Write a few lines of report, containing at least the following: the optimal switching value; the relative deviations of the experimental values from the values given by the formulas and an explanation why you do not find a good match everywhere with a single curve; the values of B for which more than half of the total time is due to the "main" term; a conclusion whether the improved multiplication makes sense in practice or not.


This page was created by Jop Sibeyn.
Last update Tuesday, 12 February 02 - 13:33.
For any comments: send an email.