Umeå universitet
Teknisk-naturvetenskaplig fakultet
Institutionen för datavetenskap
Laboration 1: Large Numbers
Delivery Date
Monday 02.02.04
Programming Language
By now you should know how to program, and how to do this effciently
and decently. If not, then you should do DOA again. In this lecture you
are free to choose any programming language that suits the purpose.
Suggested languages are C, Java or C++ (in my personal decreasing order
of preference). Considering the type of task an object-oriented
language may be preferable over plain C.
Part of Larger Project
The ultimate goal of the labs is building an public-key crypto system.
This requires finding large primes. An easy way to obtain large primes
is to generate a large random number x and to search for the first
prime number p >= x (well, this is not very efficient, because only a
small fraction of the numbers is prime, but we will address this
question later). This requires primality testing, which essentially
requires computing powers and modulo. But before we can even start
thinking in this direction we must have routines for handling very
large numbers. There will be four labs, each of them dealing with a
small subtask (i will make the exercises and check that they do not
cost me more than 8 hours of programming time):
-
Lab 1: the class of large numbers with addition, subtraction
and efficient multiplication and other basic routines. The
underlying purpose of this lab is to give you feeling for the
importance of assymptotic development.
-
Lab 2: adding modulo and power computation to the class. The
underlying puprose of this lab is to familiarize you with the
repeated doubling technique applied in the power computation.
-
Lab 3: generating large prime numbers. The underlying purpose
of this lab is to familiarize you with the concept of witnesses
and randomized testing methods.
-
Lab 4: the crypo system.
An important issue in all labs is testing the correctness. How can
you be sure that your multiplication, division and primality testing
routines indeed do what you expect them to do? In order to reveal most
bigger bugs, i prescribe some elementary tests. These will not find
minor errors that occur only for special cases (such as when dealing
with numbers of the form 2^x or 2^x - 1). It appears hard to guarantee
correctness without applying formal certificates (which would be a
terrific task).
To Do in First Lab
In this first laboration you should write
all basic procedures: addition, subtraction, multiplication and what
is further needed to handle the numbers. The maximum number B of bits
is a parameter. You may assume it is a multiple of b, where b is the
number of bits that you are packing in an unsigned int, assuming that
your super long integers are stored into an array of unsigned ints.
Before you start to program, you should decide which b you are going to
use. Be careful with unsigned ints, especially if you are using them
as a loop index, strange things happen if they become "negative".
Be aware that you are going to use these routines again in later labs,
so make it extendable! In our further applications there is no need
for negative numbers, so there is no need for a sign bit (though it
certainly makes everything more versatile).
"Wise" suggestions from my own experience. I choose to assume that
the two arguments of the binary operations have the same length. This
is easy and makes sense considering that we want to apply the recursive
multiplication algorithm. It is somewhat wasteful though. I also
prescribed the size of the result. This is unnecessary. It appears to
be better to return an answer of the size required. When multiplying
numbers of different size, one could round up to the maximum. Whatever
you assume, it is desirable that before you start a multiplication
or division (next assignmnet) you determine the actual number of
non-zero digits and operate only on those. This is even possible for
the recursive multiplication: (u * 2^t + v) * (x * 2^t + y) = u * x *
2^{2 * t} + v * y + ((u + v) * (x + y) - u * x - v * y) * 2^t. t should
be chosen so that the length of the longest number is divided by 2.
For all functions (assignment, comparison, arithmetic) you will need
versions with one long argument and one unsigened int, and a version
with two long arguments (for the exponentiation you will even need 3
versions). It is convenient to have them named xxx_short and xxx_long.
You should be aware that the only operations that matter for the total
time consumption are those that have more than linear time complexity.
A few extra copying operations at the beginning or end of a routine
do not matter. It is much more important to make sure that the routines
are correct, than to save linear time. In this context i want to warn
you for operations like x = x * y. Here x appears twice, and you should
make sure that the argument x is not destroyed before the computation
is complete.
Of course you will also need some printing routines. In the first place
you must write down some primes in lab 3, but also for debugging you
will need these. Input routines may be nice to have, but i did not need
them. If you implement them, the most convenient would be to read the
digits into an array of char, and then to pack them together, using
multiplication by 10 and addition.
Addition
Adding two of these numbers is easy: add corresponding ints, take care
of the carry, moving from least significant to most significant. Exit
with failure notice upon overflow.
Subtraction
Similar to addition, moving from least significant to most significant.
If x_i - y_i < z_i, then subtract 1 from x_{i+1} and add 2^b to z_i.
Exit with failure notice when result is negative.
Multiplication
Investigating this is the actual goal of the first laboration. You
should program one of the conventional multiplication algorithms and
the one with sub-quadratic running time. Exit with failure notice upon
overflow.
Other Routines
Generating a random number, increase/decrease/multiplication by a
b-bit number, assignment of b-bit number, general assignment, test on
zero, ==, >, <, >=, <=. For the random number, the requirements are not
too high: it is only required that you generate a number y in the range
from 0 <= y <= x, for a specified long number x. If x has k non-zero
digits, then this can be done by generating 0 <= y[k - 1] <= x[k - 1],
and 0 <= y[j] < 2^bits for all 0 <= j < k - 1.
Requirement
You must exploit that a processor can handle 32 bits (actually this
should be a parameter) at the same time (this does not mean that the
above mentioned parameter b must be 32, but it must be larger than 1).
You must supply a test routine which is doing the following: (1)
generate four random super long numbers: x1, x2, y1 and y2. (2) test,
for both multiplication routines, whether (x1 + x2) * (y1 + y2) - (x1
* y1 + x1 * y2 + x2 * y1 + x2 * y2) == 0.
Once everything is running and tested, you should optimize the choice
of the threshold parameter at which the subquadratic multiplication
algorithm is switching to the conventional algorithm.
Deliverables on Paper
Meassure for B = 32, 64, 128, ..., as far as you come in one hour,
the time consumption for the product of two numbers of B bits using
both multiplication procedures.
Plot all measured results on one page of A4 with a scaling that clearly
shows the development. I suggest that you use gnuplot for this (type
gnuplot and then help), but even a drawing by hand is acceptable.
Give formulas of the following form
T(B) = a0 + a1 * B + a2 * B^c
that describe the time consumption as good as possible. Use some
trivial trial and error method to find the parameters or use
regression. For c you should take the theoretical value: c = 2 for the
trivial algorithm and c = log_2 3 for the sophisticated algorithm.
Write a few lines of report, containing at least the following: the
optimal switching value; the relative deviations of the experimental
values from the values given by the formulas and an explanation why you
do not find a good match everywhere with a single curve; the values of
B for which more than half of the total time is due to the "main" term;
a conclusion whether the improved multiplication makes sense in
practice or not.
This page was created by Jop Sibeyn.
Last update Tuesday, 12 February 02 - 13:33.
For any comments:
send an email.