Next: Generating Partitions Up: Combinatorial Problems Previous: Generating Permutations

## Generating Subsets

Input description: An integer n.

Problem description: Generate (1) all or (2) a random or (3) the next subset of the integers .

Discussion: A subset describes a selection of objects, where the order among them does not matter. Many of the algorithmic problems in this catalog seek the best subset of a group of things: vertex cover seeks the smallest subset of vertices to touch each edge in a graph; knapsack seeks the most profitable subset of items of bounded total size; and set packing seeks the smallest subset of subsets that together cover each item exactly once.

There are distinct subsets of an n-element set, including the empty set as well as the set itself. This grows exponentially, but at a considerably smaller rate than the n! permutations of n items. Indeed, since 1,048,576, a brute-force search through all subsets of   20 elements is easily manageable, although by n=30, 1,073,741,824, so you will certainly be pushing things.

By definition, the relative order among the elements does not distinguish different subsets. Thus is the same as .   However, it is a very good idea to maintain your subsets in a sorted or canonical order, in order to speed up such operations as testing whether two subsets are identical or making them look right when printed.

As with permutations (see Section ), the key to subset generation problems is establishing a numerical sequence among all subsets. There are three primary alternatives:

• Lexicographic order -   Lexicographic order is sorted order, and often the most natural order for generating combinatorial objects. The eight subsets of in lexicographic order are , , , , , , , and . Unfortunately, it is surprisingly difficult to generate subsets in lexicographic order. Unless you have a compelling reason to do so, forget about it.
• Gray Code -   A particularly interesting and useful subset sequence is the minimum change order,   wherein adjacent subsets differ by the insertion or deletion of exactly one element. Such an ordering, called a Gray code, appears in the output picture above.

Subset generation in Gray code order can be very fast, because there is a nice recursive construction to sequence subsets. Further, since only one element changes between subsets, exhaustive search algorithms built on Gray codes can be quite efficient. A set cover program would only have to update the change in coverage by the addition or deletion of one subset. See the implementation section below for Gray code subset generation programs.

• Binary counting -   The simplest approach to subset generation problems is based on the observation that any subset S' of S is defined by which of the n=|S| items are in S'. We can represent S' by a binary string of n bits, where bit i is 1 iff the ith element of S is in S'.   This defines a bijection between the binary strings of length n, and the subsets of n items. For n=3, binary counting generates subsets in the following order: {}, {3}, {2}, {2,3}, {1}, {1,3}, {1,2}, {1,2,3}.

This binary representation is the key to solving all subset generation problems. To generate all subsets in order, simply count from 0 to . For each integer, successively mask off each of the bits and compose a subset of exactly the items corresponding to `1' bits. To generate the next or previous subset, increment or decrement the integer by one. Unranking a subset is exactly the masking procedure, while ranking constructs a binary number with 1's corresponding to items in S and then converts this binary number to an integer.

To generate a random subset, you could generate a random integer from 0 to and unrank, although you are probably asking for trouble because any flakiness with how your random number generator rounds things off means that certain subsets can never occur. Therefore, a better approach is simply to flip a coin n times, with the ith flip deciding whether to include element i in the subset.   A coin flip can be robustly simulated by generating a random real or large integer and testing whether it is bigger or smaller than half the range. A Boolean array of n items can thus be used to represent subsets as a sort of premasked integer. The only complication is that you must explicitly handle the carry if you seek to generate all subsets.

Generation problems for two closely related problems arise often in practice:

• k-subsets -   Instead of constructing all subsets, we may only be interested in the subsets containing exactly k elements. There are such subsets, which is substantially less than , particularly for small values of k.

The best way to construct all k-subsets is in lexicographic order.   The ranking function is based on the observation that there are k-subsets whose smallest element is f. Using this, it is possible to determine the smallest element in the mth k-subset of n items. We then proceed recursively for subsequent elements of the subset. See the implementations below for details.

• Strings -   Generating all subsets is equivalent to generating all strings of true and false. To generate all or random strings on alphabets of size , the same basic techniques apply, except there will be strings in total.

Implementations: Nijenhuis and Wilf [NW78] provide efficient Fortran   implementations of algorithms to construct random subsets and to sequence subsets in Gray code and lexicographic order. They also provide routines to construct random k-subsets and sequence them in lexicographic order. See Section for details on ftp-ing these programs. Algorithm 515 [BL77] of the Collected Algorithms of the ACM is another Fortran implementation of lexicographic k-subsets, available from Netlib     (see Section ).

An exciting WWW site developed by Frank Ruskey of the University of Victoria contains a wealth of material on generating combinatorial objects of different types, including permutations, subsets, partitions, and certain graphs. Specifically, it provides an interactive interface that lets you specify which type of objects you would like to construct and then returns the objects to you. Check this out at http://sue.csc.uvic.ca/ cos/.

Combinatorica [Ski90] provides Mathematica implementations     of algorithms to construct random subsets and to sequence subsets in Gray code, binary, and lexicographic order. They also provide routines to construct random k-subsets and strings, and sequence them lexicographically. See Section for further information on Combinatorica.

Notes: The primary expositions on subset generation include [NW78, RND77, Rus97]. Wilf [Wil89] provides an update of [NW78], including a thorough discussion of modern Gray code generation problems.

Gray codes were first developed [Gra53] to transmit digital   information in a robust manner over an analog channel.   By assigning the code words in Gray code order, the ith word differs only slightly from the (i+1)st, so minor fluctuations in analog signal strength corrupts only a few bits. Gray codes have a particularly nice correspondence to Hamiltonian cycles on the hypercube.     See any of the references above for details. An exposition on the more general problem of constructing Gray codes for k items (instead of subsets) appears in [Man89].

The popular puzzle Spinout, manufactured by Binary Arts Corporation,   can be solved using Gray codes.

Related Problems: Generating permutations (see page ), generating partitions (see page ).

Next: Generating Partitions Up: Combinatorial Problems Previous: Generating Permutations

Algorithms
Mon Jun 2 23:33:50 EDT 1997