Next: Exercises Up: Introduction to Algorithms Previous: About the War Stories

War Story: Psychic Modeling

The call came for me out of the blue as I sat in my office.

``Professor Skiena, I hope you can help me. I'm the President of Lotto Systems Group Inc., and we need an algorithm for a problem arising in our latest product.''

``Sure,'' I replied. After all, the dean of my engineering school is always eager for our faculty to interact with industry.

``At Lotto Systems Group, we market a program designed to improve our customers' psychic ability to predict winning lottery numbers. In a standard lottery, each ticket consists of 6 numbers selected from, say, 1 to 44. Thus any given ticket has only a very small chance of winning. However, after proper training, our clients can visualize a set of, say, 15 numbers out of the 44 and be certain that at least 4 of them will be on the winning ticket. Are you with me so far?''

``Probably not,'' I replied. But then I recalled how my dean wants us to interact with industry.

``Our problem is this. After the psychic has narrowed the choices down to 15 numbers and is certain that at least 4 of them will be on the winning ticket, we must find the most efficient way to exploit this information. Suppose that a cash prize is awarded whenever you pick at least three of the correct numbers on your ticket. We need an algorithm to construct the smallest set of tickets that we must buy in order to guarantee that we win at least one prize.''

``Assuming the psychic is correct?''

``Yes, assuming the psychic is correct. We need a program that prints out a list of all the tickets that the psychic should buy in order to minimize their investment. Can you help us?''

I thought about the problem for a minute. Maybe they did have psychic ability, for they had clearly come to the right place. Identifying the best subset of tickets to buy was very much a combinatorial algorithm problem. It was going to be some type of covering problem, where each ticket we buy was going to ``cover'' some of the possible 4-element subsets of the psychic's set. Finding the absolute smallest set of tickets to cover everything was a special instance of the NP-complete problem set cover (discussed in Section ) and presumably computationally intractable.

Figure: Covering all pairs of with tickets , , ,

It was indeed a special instance of set cover, completely specified by only four numbers: the size n of the candidate set S (typically ), the number of k slots for numbers on each ticket (typically ), the number of psychically promised correct numbers j from S (say j=4), and finally, the number of numbers l matching the winning ticket necessary to win a prize (say l=3). Figure illustrates a covering of a smaller instance, where n=5, k=3, and j=l=2.

``Although it will be hard to find the exact minimum set of tickets to buy, with heuristics I should be able to get you pretty close to the cheapest covering ticket set,'' I told him. ``Will that be good enough?''

``So long as it generates better ticket sets than my competitor's program, that will be fine. His system doesn't always provide enough coverage to guarantee a win. I really appreciate your help on this, Professor Skiena.''

``One last thing. If your program can train people to pick lottery winners, why don't you use it to win the lottery yourself?''

``I look forward to talking to you again real soon, Professor Skiena. Thanks for the help.''

I hung up the phone and got back to thinking. It seemed like a fun problem, and the perfect project to give to a bright undergraduate. After it was modeled in terms of sets and subsets, the basic components of a solution seemed fairly straightforward:

• We needed the ability to generate all subsets of k numbers from the candidate set S. Algorithms for generating and ranking/unranking subsets of sets are presented in Section .
• We needed the right formulation of what it meant to have a covering set of purchased tickets. The obvious criteria would be to pick a small set of tickets such that for each of the l-subsets of S that might pay off with the prize, we have purchased at least one ticket that contains it.
• For each new ticket we decided to buy, we would need to keep track of which prize combinations we have thus far covered. Presumably, we would want to buy tickets to cover as many thus-far-uncovered prize combinations as possible. The currently covered combinations are a subset of all possible combinations, so we would need a data structure to represent the subset. Such data structures are discussed in Section . The best candidate seemed to be a bit vector, which would permit a constant time query of ``is this combination already covered?'' in concert with a way to rank the subset, i.e. hash it to a unique integer.
• We would need a search mechanism to decide which ticket to buy next. For small enough set sizes, we could do an exhaustive search over all possible subsets of tickets and pick the smallest one. For larger sizes, a randomized search process like simulated annealing (see Section ) would select tickets-to-buy to cover as many uncovered combinations as possible. By repeating this randomized procedure several times and picking the best solution, we would be likely to come up with a very good set of tickets.

Excluding the search mechanism, the pseudocode for the bookkeeping looked something like this:

```

LottoTicketSet(n,k,l)

Initialize the   -element bit-vector V to all false

While there exists a false entry of V

Select a k-subset T of    as the next ticket to buy

For each of the l-subsets    of T,

Report the set of tickets bought

```

The bright undergraduate, Fayyaz Younas, rose to the challenge. Based on this framework, he implemented a brute-force search algorithm and found optimal solutions for problems with in a reasonable time. To solve larger problems, he implemented a random search procedure, tweaking it for a while before settling on the best variant. Finally, the day arrived when we could call Lotto Systems Group and announce that we had solved the problem.

``See, our program found that optimal solution for n=15, k=6, j=6, l=3 meant buying 28 tickets.''

``Twenty-eight tickets!'' complained the president. ``You must have a bug. Look, these five tickets will suffice to cover everything twice over: , , , , .''

Figure: Guaranteeing a winning ticket from using only tickets and

We fiddled with this example for a while before admitting that he was right. We hadn't modeled the problem correctly! In fact, we didn't need to explicitly cover all possible winning combinations. Figure illustrates the principle by giving a better solution to the previous small example. As a different example, suppose we needed three numbers on a five element ticket to win, and the psychic had promised us that four numbers would be from . Suppose that we had bought tickets that covered the subsets , , and . There would be no need to buy a ticket that explicitly covered , because if 1 and 2 and two other psychic numbers were on a winning ticket, then at least one of the other two psychic numbers would have to be either 3, 4, or 5. If 1 and 2 were not on the winning ticket, a ticket covering would do us absolutely no good. Thus we were trying to cover too many combinations, and the penny-pinching psychics were unwilling to pay for such extravagance.

Fortunately, this story has a happy ending, as reported in [YS96]. The general outline of our search-based solution described above still holds for the real problem. All that is needed to fix it is to change how many uncovered subsets we credit for covering with a given set of tickets. After this modification, we obtained the kind of results they were hoping for. Lotto Systems Group gratefully accepted our program to incorporate into their products, and we hope they hit the jackpot with it.

The moral of this story is to make sure that you correctly model the problem before trying to solve it. In our case, we came up with a reasonable model for the problem but didn't work hard enough to validate it before we started to program. Our misinterpretation would have become obvious had we worked out a small but non-trivial example by hand and bounced it off our sponsor before beginning work. Our success in recovering from the error is a tribute to the basic correctness of our initial formulation and our use of well-defined routines and abstractions for such tasks as (1) ranking/unranking k-subsets, (2) the set data structure, and (3) combinatorial search.

Next: Exercises Up: Introduction to Algorithms Previous: About the War Stories

Algorithms
Mon Jun 2 23:33:50 EDT 1997