Input description: An integer n.
Problem description: Generate (1) all or (2) a random or (3) the next subset
of the integers .
Discussion: A subset describes a selection of objects, where the order among them does not matter. Many of the algorithmic problems in this catalog seek the best subset of a group of things: vertex cover seeks the smallest subset of vertices to touch each edge in a graph; knapsack seeks the most profitable subset of items of bounded total size; and set packing seeks the smallest subset of subsets that together cover each item exactly once.
There are distinct subsets of an n-element set, including the empty
set as well as the set itself.
This grows exponentially, but at a considerably smaller rate than the
n! permutations of n items.
Indeed, since
1,048,576, a
brute-force search through all subsets of
20 elements is easily manageable, although
by n=30,
1,073,741,824, so you will certainly be
pushing things.
By definition, the relative order among the elements does not distinguish
different subsets.
Thus is the same as
.
However, it is a very good idea to maintain your subsets in a sorted or
canonical order, in order
to speed up such operations as testing whether two
subsets are identical or making them look right when printed.
As with permutations (see Section ),
the key to subset generation problems is establishing
a numerical sequence among all
subsets.
There are three primary alternatives:
Subset generation in Gray code order can be very fast, because there is a nice recursive construction to sequence subsets. Further, since only one element changes between subsets, exhaustive search algorithms built on Gray codes can be quite efficient. A set cover program would only have to update the change in coverage by the addition or deletion of one subset. See the implementation section below for Gray code subset generation programs.
This binary representation is the key to solving all subset generation
problems.
To generate all subsets in order, simply count from 0 to .
For each integer, successively mask
off each of the bits and compose a subset of exactly the items
corresponding to `1' bits.
To generate the next or previous subset,
increment or decrement the integer by one.
Unranking a subset is exactly the masking procedure, while ranking
constructs a binary number with 1's corresponding to items in S
and then converts this binary number to an integer.
To generate a random subset, you could generate a random integer from 0 to
and unrank, although you are probably asking for trouble because any
flakiness with how your random number generator rounds things off
means that certain subsets can never occur.
Therefore,
a better approach is simply to flip a coin n times,
with the ith flip deciding whether to include element i in the subset.
A coin flip can be robustly simulated by generating a random real or large
integer and testing whether it is bigger or smaller than half the range.
A Boolean array of n items can thus be used to represent subsets
as a sort of premasked integer.
The only complication is that you must
explicitly handle the carry if you seek to generate all subsets.
Generation problems for two closely related problems arise often in practice:
The best way to construct all k-subsets is in lexicographic order.
The ranking function is based on the observation that there are
k-subsets whose smallest element is f.
Using this, it is possible to determine the smallest element in the mth k-subset
of n items.
We then proceed recursively for subsequent elements of the subset.
See the implementations below for details.
Implementations: Nijenhuis and Wilf [NW78] provide efficient Fortran
implementations of algorithms to construct random subsets and to
sequence subsets in Gray code and lexicographic order.
They also provide routines to construct random k-subsets and sequence
them in lexicographic order.
See Section for details on ftp-ing these programs.
Algorithm 515 [BL77] of the Collected Algorithms of the ACM is another Fortran
implementation of lexicographic k-subsets, available from Netlib
(see Section
).
An exciting WWW site developed by Frank Ruskey of the University of
Victoria contains a wealth of material on generating combinatorial objects
of different types, including permutations, subsets, partitions, and certain
graphs.
Specifically, it provides an interactive interface that lets you specify
which type of objects you would like to construct and then returns
the objects to you.
Check this out at
http://sue.csc.uvic.ca/ cos/.
Combinatorica [Ski90] provides Mathematica implementations
of algorithms to construct random subsets and to
sequence subsets in Gray code, binary, and lexicographic order.
They also provide routines to construct random k-subsets and strings,
and sequence them lexicographically.
See Section for further information on Combinatorica.
Notes: The primary expositions on subset generation include [NW78, RND77, Rus97]. Wilf [Wil89] provides an update of [NW78], including a thorough discussion of modern Gray code generation problems.
Gray codes were first developed [Gra53] to transmit digital
information in a robust
manner over an analog channel.
By assigning the code words in Gray code order, the ith word differs
only slightly from the (i+1)st, so minor fluctuations in analog
signal strength corrupts only a few bits.
Gray codes have a particularly nice correspondence to Hamiltonian cycles on
the hypercube.
See any of the references above for details.
An exposition on the more general problem of constructing Gray codes
for k items (instead of subsets) appears in [Man89].
The popular puzzle Spinout, manufactured by Binary Arts Corporation, can be solved using Gray codes.
Related Problems: Generating permutations (see page ),
generating partitions (see page
).