Next: Finite State Machine Minimization Up: Set and String Problems Previous: Text Compression

## Cryptography

Input description: A plaintext message T or encrypted text E, and a key k.

Problem description: Encode T using k giving E, or decode E using k back to T.

Discussion: Cryptography has grown substantially in importance in recent years, as computer networks have made confidential documents more vulnerable to prying eyes.       Cryptography is a way to increase security by making messages difficult to read if they fall into the wrong hands. Although the discipline of cryptography is at least two thousand years old, its algorithmic and mathematical foundations have recently solidified to the point where there can now be talk of provably secure cryptosystems.

There are three classes of cryptosystems everyone should be aware of:

• Caesar shifts -       The oldest ciphers involve mapping each character of the alphabet to a different letter. The weakest such ciphers rotate the alphabet by some fixed number of characters (often 13), and thus have only 26 possible keys. Better is to use an arbitrary permutation of the letters, so there are 26! possible keys. Even so, such systems can be easily attacked by counting the frequency of each symbol and exploiting the fact that `e' occurs more often than `z'. While there are variants that will make this more difficult to break, none will be as secure as DES or RSA.
• Data Encryption Standard (DES) - This algorithm is based on repeatedly       shuffling the bits of your text as governed by the key. The standard key length for DES (56 bits) is now considered too   short for applications requiring the highest level of security. However, a simple variant called triple DES permits an effective key length of 112 bits by using three rounds of DES with two 56-bit keys.   In particular, first encrypt with key1, then decrypt with key2, before finally encrypting with key1. There is a mathematical reason for using three rounds instead of two, and the encrypt-decrypt-encrypt pattern is used so that the scheme is equivalent to single DES when key1 = key2.
• Rivest-Shamir-Adelman (RSA) - RSA is a public key cryptosystem,       meaning that different keys are used to encode and decode messages. Since the encoding key is of no help in decoding, it can be made public at no risk to security. The security of RSA is based on the difference in the computational complexity of factoring and primality testing (see Section ).    Encoding is (relatively) fast because it relies on primality testing to construct the key, while the hardness of decryption follows from that of factoring. Still, RSA is slow relative to other cryptosystems, roughly 100 to 1,000 times slower than DES.

The key issue in selecting a cryptosystem is identifying your paranoia level, i.e. deciding how much security you need.   Who are you trying to stop from reading your stuff: your grandmother, local thieves, the Mafia, or the NSA?     If you can use an accepted implementation of RSA, such as PGP discussed below, you should feel safe against just about anybody.

If there is an implementation of DES on your machine, that will likely be good enough for most applications. For example, I use DES to encrypt my final exam each semester, and it proved more than sufficient the time an ambitious student broke into my office looking for it.   If the NSA had been breaking in, the story might have been different, although even here it is important to understand that the most serious security holes are human, not algorithmic.   Making sure your password is long enough, hard to guess, and not written down is far more important than obsessing about the encryption algorithm.

Simple ciphers like the Caesar shift are fun and easy to program. For this reason, it is perhaps healthy to use them for applications needing only a casual level of security (such as hiding the punchlines of jokes). Since they are easy to break, they should never be used for serious security applications.

One thing that you should never do is mess around with developing your own novel cryptosystem. The security of DES and RSA is accepted largely because these systems have both survived over twenty years of public scrutiny. Over this period, many other cryptosystems have been proposed, proven vulnerable to attack, and then abandoned. This is not a field for amateurs. If you are charged with implementing a cryptosystem, carefully study a respected program such as PGP (discussed below) to see how its author, Philip Zimmermann, skillfully handled such issues as key selection and key distribution.   A cryptosystem is as strong as its weakest link.

There are several problems related to cryptography that arise often in practice:

• How can I validate the integrity of data against corruption? -     In any communications application, there is need to validate that the transmitted data is identical to that which has been received. One solution is for the received to transmit the data back to the source and have the original sender confirm that the two texts are identical. This fails when the exact inverse of an error is made in the retransmission, but a more serious problem is that your available bandwidth is cut in half with such a scheme.

A more efficient if less reliable method is to use a checksum, a   simple mathematical function that hashes a long text down to a simple number or digit, and then transmit the checksum along with the text. The checksum can be recomputed on the receiving end and bells set off if the computed checksum is not identical to what was received. Perhaps the simplest checksum scheme just adds up the byte or character values and takes the sum modulo some constant, say . Unfortunately, an error transposing two or more characters would go undetected under such a scheme, since addition is commutative.

Cyclic-redundancy check (CRC)     provides a more powerful method for computing checksums and is used in most communications systems and internally in computers to validate disk drive transfers.   These codes compute the remainder in the ratio of two polynomials, the numerator of which is a function of the input text. The design of these polynomials involves considerable mathematical sophistication and ensures that all reasonable errors are detected. The details of efficient computation are complicated enough that we recommend that you start from an existing implementation, described below.

• How can I prove that a file has not been changed? - If I send you a contract in electronic form, what is to stop you from editing the file and then claiming that your version was what we had really agreed to? I need a way to prove that any modification to a document is fraudulent.   Digital signatures are a cryptographic way for me to stamp my document as genuine.

Given a file, I can compute a checksum for it, and then encrypt this checksum using my own private key. I send you the file and the encrypted checksum. You can now edit the file, but to fool the judge you must also edit the encrypted checksum such that it can be decrypted to the correct checksum. With a suitably good checksum function, designing a file that yields the same checksum becomes an insurmountable problem.

• How can I prove that I am me? - Authentication is the process of one party convincing another that they are who they say they are. The historical solutions have involved passwords or keys, so I prove that I am who I say I am because I know my credit card number or carry an ID card.   The problem with such schemes is that anyone who eavesdrops on this conversation or who steals my physical key can now successfully impersonate me.

What we need is some way for me to convince you that I know my key, without actually telling you the key. One such method to do so is for you to send me a random number or text, and I use my key to encrypt your text and send it back to you. If you then decrypt this text and compare it to the message you sent me, you gain confidence that I am who I say I am. We can repeat this exercise on several random texts until sufficient authentication has been agreed upon. If we use a secure enough cryptosystem, we can be confident that an eavesdropper will not be able to deduce my key even given several plain and encrypted texts.

Such authentication protocols of back-and-forth messages often involve the use of randomness to frustrate eavesdroppers.       Different protocols satisfy particular needs and constraints about who has to know what. It is important to do some reading before attempting to design your own protocols. References are provided below.

Implementations: The USENET FAQ (frequently asked questions) file on cryptography provides a wealth of information, including pointers   to implementations. Check it out at ftp://rtfm.mit.edu/pub/usenet/news.answers/cryptography-faq/.

Distributing cryptographic software is complicated by United States export restrictions, which make it illegal to export encryption software. PGP (Pretty Good Privacy) is such a good implementation of RSA that its author Philip Zimmerman was charged with export violations by federal authorities.   PGP may be obtained from the Electronic Frontier Foundation (EFF) at http://www.eff.org/pub/Net_info/Tools/Crypto/PGP/.

A good discussion on checksums and cyclic-redundancy codes, with implementations in C, appear in [BR95]. The code for these algorithms is printed in the text and is available on disk for a modest fee.

The Stanford Graphbase (see Section ) uses checksums to ensure that data files remain unmodified from the original distribution.     Algorithm 536 [Kno79] of the Collected Algorithms of the ACM is an encryption function for passwords, written in Fortran. See Section for further information.

Notes: Kahn [Kah67] presents the fascinating history of cryptography from ancient times to 1967 and is particularly noteworthy in light of the secretive nature of the subject.   More recent and more technical works on cryptography include Denning [Den82] and Schneier [Sch94], the latter of which provides a through overview of different cryptographic algorithms, including implementations for sale. Rawlins [Raw92] provides a good introduction to cryptographic algorithms, from Caesar shift to public key to zero-knowledge proofs.   An algorithm for breaking simple substitution ciphers appears in [PR79].

Expositions on the RSA algorithm [RSA78] include [CLR90]. The RSA Laboratories home page http://www.rsa.com/rsalabs/ is very informative. See [Sta95] for an excellent guide to PGP and its underlying algorithms.

The history of DES is well presented in [Sch94]. Particularly controversial was the decision by the NSA to limit key length to 56 bits, presumably short enough to be cracked by special-purpose computers costing on the order of several million dollars.   Despite some theoretical progress in breaking DES analytically [BS93], the most significant threat remains special-purpose hardware.

MD5 [Riv92] is the secure hashing function used by PGP to compute digital signatures.     Expositions include [Sch94, Sta95].

Related Problems: Factoring and primality testing (see page ), Text compression (see page )).

Next: Finite State Machine Minimization Up: Set and String Problems Previous: Text Compression

Algorithms
Mon Jun 2 23:33:50 EDT 1997