Next: Set Packing Up: Set and String Problems Previous: Set and String Problems

## Set Cover

Input description: A set of subsets of the universal set .

Problem description: What is the smallest subset T of S such that ?

Discussion: Set cover arises when you try to efficiently acquire or represent items that have been packaged in a fixed set of lots. You want to obtain all the items, while buying as few lots as possible. Finding a cover is easy, because you can always buy one of each lot. However, by finding a small set cover you can do the same job for less money.

An interesting application of set cover is Boolean logic minimization. We are given a particular Boolean function of k variables, which for each of the possible input vectors describes whether the desired output is 0 or 1. We seek the simplest circuit that exactly implements this function. One approach is to find a disjunctive normal form (DNF) formula on the variables and their complements, such as . We could build one ``and'' term for each input vector and then ``or'' them all together, but we might save considerably by factoring out common subsets of variables. Given a set of feasible ``and'' terms, each of which covers a subset of the vectors we need, we seek to ``or'' together the smallest number of terms that realize the function. This is exactly the set cover problem.

There are several variations of set cover problems to be aware of:

• Are you allowed to cover any element more than once? - The distinction here is between set covering and set packing, the latter of which is discussed in Section . If we are allowed to cover elements more than once, as in the logic minimization problem above, we should take advantage of this freedom, as it usually results in a smaller covering.
• Are your sets derived from the edges or vertices of a graph? - Set cover is a very general problem, and it includes several useful graph problems as special cases. Suppose instead that you seek the smallest set of edges in a graph that covers each vertex at least once. The solution is to find a maximum matching in the graph (see Section ), and then add arbitrary edges to cover the remaining vertices. Suppose you seek the smallest set of vertices in a graph that covers each edge at least once. This is the vertex cover problem, discussed in Section .

It is instructive to model vertex cover as an instance of set cover. Let the universal set U be the set of edges . Construct n subsets, with consisting of the edges incident on vertex . Although vertex cover is just a set cover problem in disguise, you should take advantage of the fact that better algorithms exist for vertex cover.

• Do your subsets contain only two elements each? - When all of your subsets have at most two elements each, you are in luck. This is about the only special case that you can solve efficiently to optimality, by using the matching technique described above. Unfortunately, as soon as your subsets get to have three elements each, the problem becomes NP-complete.
• Do you want to cover elements with sets, or sets with elements? - In the hitting set problem, we seek a small number of items that together represent an entire population. Formally, we are given a set of subsets of the universal set , and we seek the smallest subset such that each subset contains at least one element of T. Thus for all . Suppose we desire a small Congress with at least one representative of each ethnic group. If each ethnic group is represented as a subset of people, the minimum hitting set is the smallest possible politically correct Congress. Hitting set is illustrated in Figure .

Figure: Hitting set is dual to set cover

Hitting set is, in fact, dual to set cover, meaning it is exactly the same problem in disguise. Replace each element of U by a set of the names of the subsets that contain it. Now S and U have exchanged roles, for we seek a set of subsets from U to cover all the elements of S. This is is exactly the set cover problem. Thus we can use any of the set cover codes below to solve hitting set after performing this simple translation.

Since the vertex cover problem is NP-complete, the set cover problem must be at least as hard. In fact, it is somewhat harder. Approximation algorithms do no worse than twice optimal for vertex cover, but only a times optimal approximation algorithm exists for set cover.

The greedy heuristic is the right approach for set cover. Begin by placing the largest subset in the set cover, and then mark all its elements as covered. We will repeatedly add the subset containing the largest number of uncovered elements, until all elements are covered. This heuristic always gives a set cover using at most times as many sets as optimal, and in practice it usually does a lot better.

The simplest implementation of the greedy heuristic sweeps through the entire input instance of m subsets for each greedy step. However, by using such data structures as linked lists and a bounded-height priority queue (see Section ), the greedy heuristic can be implemented in O(S) time, where is the size of the input representation.

It pays to check whether there are certain elements that exist in only a few subsets, ideally only one. If so, we should select the biggest subsets containing these elements at the very beginning. We will have to take them eventually, and they carry with them extra elements that we might have to pay to cover by waiting until later.

Simulated annealing is likely to produce somewhat better set covers than these simple heuristics, if that is important for your application. Backtracking can be used to guarantee you an optimal solution, but typically it is not worth the computational expense.

Implementations: Pascal implementations of an exhaustive search algorithm for set packing, as well as heuristics for set cover, appear in [SDK83]. See Section .

Notes: Survey articles on set cover include [BP76]. See [CK75] for a computational study of set cover algorithms. An excellent exposition on algorithms and reduction rules for set cover is presented in [SDK83].

Good expositions of the greedy heuristic for set cover include [CLR90]. An example demonstrating that the greedy heuristic for set cover can be as bad as is presented in [Joh74, PS82]. This is not a defect of the heuristic. Indeed, it is provably hard to approximate set cover to within an approximation factor better than [LY93].

Related Problems: Matching (see page ), vertex cover (see page ), set packing (see page ).

Next: Set Packing Up: Set and String Problems Previous: Set and String Problems

Algorithms
Mon Jun 2 23:33:50 EDT 1997