Next: Lecture 3 - recurrence Up: No Title Previous: Lecture 1 - analyzing

# Lecture 2 - asymptotic notation

Problem 1.2-6:   How can we modify almost any algorithm to have a good best-case running time?

To improve the best case, all we have to do it to be able to solve one instance of each size efficiently. We could modify our algorithm to first test whether the input is the special instance we know how to solve, and then output the canned answer.

For sorting, we can check if the values are already ordered, and if so output them. For the traveling salesman, we can check if the points lie on a line, and if so output the points in that order.

The supercomputer people pull this trick on the linpack benchmarks!

Because it is so easy to cheat with the best case running time, we usually don't rely too much about it.

Because it is usually very hard to compute the average running time, since we must somehow average over all the instances, we usually strive to analyze the worst case running time.

The worst case is usually fairly easy to analyze and often close to the average or real running time.

Exact Analysis is Hard!

We have agreed that the best, worst, and average case complexity of an algorithm is a numerical function of the size of the instances.

However, it is difficult to work with exactly because it is typically very complicated!

Thus it is usually cleaner and easier to talk about upper and lower bounds of the function.

This is where the dreaded big O notation comes in!

Since running our algorithm on a machine which is twice as fast will effect the running times by a multiplicative constant of 2 - we are going to have to ignore constant factors anyway.

Names of Bounding Functions

Now that we have clearly defined the complexity functions we are talking about, we can talk about upper and lower bounds on it:

• g(n) = O(f(n)) means is an upper bound on g(n).
• means is a lower bound on g(n).
• means is an upper bound on g(n) and is a lower bound on g(n).

Got it? C, , and are all constants independent of n.

All of these definitions imply a constant beyond which they are satisfied. We do not care about small values of n.

O, , and

The value of shown is the minimum possible value; any greater value would also work.

(a) if there exist positive constants , , and such that to the right of , the value of f(n) always lies between and inclusive.

(b) f(n) = O(g(n)) if there are positive constants and c such that to the right of , the value of f(n) always lies on or below .

(c) if there are positive constants and c such that to the right of , the value of f(n) always lies on or above .

Asymptotic notation are as well as we can practically deal with complexity functions.

What does all this mean?

Think of the equality as meaning in the set of functions.

Note that time complexity is every bit as well defined a function as or you bank account as a function of time.

Testing Dominance

f(n) dominates g(n) if , which is the same as saying g(n)=o(f(n)).

Note the little-oh - it means ``grows strictly slower than''.

Knowing the dominance relation between common functions is important because we want algorithms whose time complexity is as low as possible in the hierarchy. If f(n) dominates g(n), f is much larger (ie. slower) than g.

• dominates if a > b since

• doesn't dominate since

 Complexity 10 20 30 40 50 60 n 0.00001 sec 0.00002 sec 0.00003 sec 0.00004 sec 0.00005 sec 0.00006 sec 0.0001 sec 0.0004 sec 0.0009 sec 0.016 sec 0.025 sec 0.036 sec 0.001 sec 0.008 sec 0.027 sec 0.064 sec 0.125 sec 0.216 sec 0.1 sec 3.2 sec 24.3 sec 1.7 min 5.2 min 13.0 min 0.001 sec 1.0 sec 17.9 min 12.7 days 35.7 years 366 cent 0.59 sec 58 min 6.5 years 3855 cent cent cent

Logarithms

It is important to understand deep in your bones what logarithms are and where they come from.

A logarithm is simply an inverse exponential function. Saying is equivalent to saying that .

Exponential functions, like the amount owed on a n year mortgage at an interest rate of per year, are functions which grow distressingly fast, as anyone who has tried to pay off a mortgage knows.

Thus inverse exponential functions, ie. logarithms, grow refreshingly slowly.

Binary search is an example of an algorithm. After each comparison, we can throw away half the possible number of keys. Thus twenty comparisons suffice to find any name in the million-name Manhattan phone book!

If you have an algorithm which runs in time, take it, because this is blindingly fast even on very large instances.

Properties of Logarithms

Recall the definition, .

Asymptotically, the base of the log does not matter:

Thus, , and note that is just a constant.

Asymptotically, any polynomial function of n does not matter:

Note that

since , and .

Any exponential dominates every polynomial. This is why we will seek to avoid exponential time algorithms.

Federal Sentencing Guidelines

2F1.1. Fraud and Deceit; Forgery; Offenses Involving Altered or Counterfeit Instruments other than Counterfeit Bearer Obligations of the United States.

(a) Base offense Level: 6

(b) Specific offense Characteristics

(1) If the loss exceeded \$2,000, increase the offense level as follows:

Listen To Part 2-10

The federal sentencing guidelines are designed to help judges be consistent in assigning punishment. The time-to-serve is a roughly linear function of the total level.

However, notice that the increase in level as a function of the amount of money you steal grows logarithmically in the amount of money stolen.

This very slow growth means it pays to commit one crime stealing a lot of money, rather than many small crimes adding up to the same amount of money, because the time to serve if you get caught is much less.

The Moral: ``if you are gonna do the crime, make it worth the time!''

Working with the Asymptotic Notation

Suppose and .

What do we know about g'(n) = f(n)+g(n)? Adding the bounding constants shows .

What do we know about g''(n) = f(n)-g(n)? Since the bounding constants don't necessary cancel,

We know nothing about the lower bounds on g'+g'' because we know nothing about lower bounds on f, g.

Suppose and .

What do we know about g'(n) = f(n)+g(n)? Adding the lower bounding constants shows .

What do we know about g''(n) = f(n)-g(n)? We know nothing about the lower bound of this!

The Complexity of Songs

Suppose we want to sing a song which lasts for n units of time. Since n can be large, we want to memorize songs which require only a small amount of brain space, i.e. memory.

Let S(n) be the space complexity of a song which lasts for n units of time.

The amount of space we need to store a song can be measured in either the words or characters needed to memorize it. Note that the number of characters is since every word in a song is at most 34 letters long - Supercalifragilisticexpialidocious!

What bounds can we establish on S(n)?

• S(n) = O(n), since in the worst case we must explicitly memorize every word we sing - ``The Star-Spangled Banner''
• , since we must know something about our song to sing it.

The Refrain

Most popular songs have a refrain, which is a block of text which gets repeated after each stanza in the song:

Bye, bye Miss American pie
Drove my chevy to the levy but the levy was dry
Them good old boys were drinking whiskey and rye
Singing this will be the day that I die.

Refrains made a song easier to remember, since you memorize it once yet sing it O(n) times. But do they reduce the space complexity?

Not according to the big oh. If

Then the space complexity is still O(n) since it is only halved (if the verse-size = refrain-size):

The k Days of Christmas

To reduce S(n), we must structure the song differently.

Consider ``The k Days of Christmas''. All one must memorize is:

On the kth Day of Christmas, my true love gave to me,

On the First Day of Christmas, my true love gave to me, a partridge in a pear tree

But the time it takes to sing it is

If , then , so .

100 Bottles of Beer

What do kids sing on really long car trips?

n bottles of beer on the wall,
n bottles of beer.
You take one down and pass it around
n-1 bottles of beer on the ball.

All you must remember in this song is this template of size , and the current value of n. The storage size for n depends on its value, but bits suffice.

This for this song, .

Is there a song which eliminates even the need to count?

That's the way, uh-huh, uh-huh
I like it, uh-huh, huh

Reference: D. Knuth, `The Complexity of Songs', Comm. ACM, April 1984, pp.18-24

Next: Lecture 3 - recurrence Up: No Title Previous: Lecture 1 - analyzing

Algorithms
Mon Jun 2 09:21:39 EDT 1997