Input description: A graph G=(V,E).
Problem description: What is the largest such that for all , ?
Discussion: When I went to high school, everybody complained about the ``clique'', a group of friends who all hung around together and seemed to dominate everything social. Consider a graph whose vertices represent a set of people, with edges between any pair of people who are friends. Thus the clique in high school was in fact a clique in this friendship graph.
Identifying ``clusters'' of related objects often reduces to finding large cliques in graphs. One example is in a program recently developed by the Internal Revenue Service (IRS) to detect organized tax fraud, where groups of phony tax returns are submitted in the hopes of getting undeserved refunds. The IRS constructs graphs with vertices corresponding to submitted tax forms and with edges between any two tax-forms that appear suspiciously similar. A large clique in this graph points to fraud.
Since any edge in a graph represents a clique of two vertices, the challenge lies not in finding a clique, but in finding a large one. And it is indeed a challenge, for finding a maximum clique is NP-complete. To make matters worse, not only is no good approximation algorithm known, it is provably hard to approximate even to within a factor of . Theoretically, clique is about as hard as a problem in this book can get. So what can we hope to do about it?
A simple linear-time algorithm can be used to find the largest set of vertices whose induced (defined) subgraph has minimum vertex degree , beginning by repeatedly deleting all the vertices whose degree is less than k. This may reduce the degree of other vertices below k if they were adjacent to any deleted low-degree vertices. By repeating this process until all remaining vertices have degree , we eventually construct the largest dense subgraph. This algorithm can be implemented in O(n+m) time by using adjacency lists and the constant-width priority queue of Section . By continuing to delete the lowest-degree vertices, we will eventually end up with a clique, which may or may not be large depending upon the graph.
If you really need to find the largest clique in a graph, an exhaustive search via backtracking provides the only real solution. We search through all k-subsets of the vertices, pruning a subset as soon as it contains a vertex that is not connected to all the rest. We never need consider a subset of size larger than the highest vertex degree in the graph, since the maximum degree gives an upper bound on the size of the largest clique in the graph. Similarly, as soon as we discover a clique of size k, no vertex of degree can help find a larger clique. To speed our search, we should delete all such useless vertices from G.
Heuristics for finding large cliques based on randomized techniques such as simulated annealing are likely to work reasonably well.
Implementations: Programs for the closely related problems of finding cliques and independent sets were sought for the Second DIMACS Implementation Challenge, held in October 1993. Programs and data from the challenge are available by anonymous ftp from dimacs.rutgers.edu. Source codes are available under pub/challenge/graph and test data under pub/djs.
In particular, two C language programs by David S. Johnson and David L. Applegate are available. The dfmax.c implements a simple-minded branch-and-bound algorithm similar to that of [CP90]. The dmclique.c uses a ``semi-exhaustive greedy'' scheme for finding large independent sets described in [JAMS91]. Performance data for both programs is available in files results.dfmax and results.dmclique in directories /pub/challenge/graph/benchmarks/clique and /pub/challenge/graph/benchmarks/volume.
Combinatorica [Ski90] provides (slow) Mathematica implementations of cliques, independent sets, and vertex covers. See Section .
Notes: Good expositions of the proof that clique is NP-complete [Kar72] include [CLR90, GJ79, Man89]. It is also given in Section . This reduction established that clique, vertex cover, and independent set are very closely related problems, so heuristics and programs that solve one of them may also produce reasonable solutions for the other two.
The linear-time algorithm for constructing maximal induced subgraphs is discussed in [Man89]. That clique cannot be approximated to within a factor of is shown in [BGS95].
Related Problems: Independent set (see page ), vertex cover (see page ).