next up previous contents index CD CD Algorithms
Next: Independent Set Up: Graph Problems: Hard Problems Previous: Graph Problems: Hard Problems




Input description: A graph G=(V,E).

Problem description: What is the largest tex2html_wrap_inline29261 such that for all tex2html_wrap_inline29263 , tex2html_wrap_inline29265 ?

Discussion: When I went to high school, everybody complained about the ``clique'', a group of friends who all hung around together and seemed to dominate everything social.   Consider a graph whose vertices represent a set of people, with edges between any pair of people who are friends. Thus the clique in high school was in fact a clique in this friendship graph.  

Identifying ``clusters'' of related objects often reduces to finding large cliques in graphs.   One example is in a program recently developed by the Internal Revenue Service (IRS) to detect organized tax fraud,   where groups of phony tax returns are submitted in the hopes of getting undeserved refunds. The IRS constructs graphs with vertices corresponding to submitted tax forms and with edges between any two tax-forms that appear suspiciously similar.   A large clique in this graph points to fraud.

Since any edge in a graph represents a clique of two vertices, the challenge lies not in finding a clique, but in finding a large one. And it is indeed a challenge, for finding a maximum clique is NP-complete.   To make matters worse, not only is no good approximation algorithm known, it is provably hard to approximate even to within a factor of tex2html_wrap_inline29267 . Theoretically, clique is about as hard as a problem in this book can get. So what can we hope to do about it?

If you really need to find the largest clique in a graph, an exhaustive search via backtracking provides the only real solution.   We search through all k-subsets of the vertices, pruning a subset as soon as it contains a vertex that is not connected to all the rest. We never need consider a subset of size larger than the highest vertex degree in the graph, since the maximum degree gives an upper bound on the size of the largest clique in the graph. Similarly, as soon as we discover a clique of size k, no vertex of degree tex2html_wrap_inline29277 can help find a larger clique. To speed our search, we should delete all such useless vertices from G.

Heuristics for finding large cliques based on randomized techniques   such as simulated annealing are likely to work reasonably well.

Implementations: Programs for the closely related problems of finding cliques and independent sets were sought for the Second DIMACS Implementation Challenge, held in October 1993.   Programs and data from the challenge are available by anonymous ftp from Source codes are available under pub/challenge/graph and test data under pub/djs.

In particular, two C language programs by David S. Johnson and David L. Applegate are available.   The dfmax.c implements a simple-minded branch-and-bound algorithm similar to that of [CP90].   The dmclique.c uses a ``semi-exhaustive greedy'' scheme for finding large independent sets described in [JAMS91]. Performance data for both programs is available in files results.dfmax and results.dmclique in directories /pub/challenge/graph/benchmarks/clique and /pub/challenge/graph/benchmarks/volume.

Combinatorica [Ski90] provides (slow) Mathematica implementations of cliques, independent sets, and vertex covers.     See Section gif.

Notes: Good expositions of the proof that clique is NP-complete [Kar72] include [CLR90, GJ79, Man89]. It is also given in Section gif. This reduction established that clique, vertex cover, and independent set are very closely related problems, so heuristics and programs that solve one of them may also produce reasonable solutions for the other two.

The linear-time algorithm for constructing maximal induced subgraphs is discussed in [Man89]. That clique cannot be approximated to within a factor of tex2html_wrap_inline29279 is shown in [BGS95].

Related Problems: Independent set (see page gif), vertex cover (see page gif).    

next up previous contents index CD CD Algorithms
Next: Independent Set Up: Graph Problems: Hard Problems Previous: Graph Problems: Hard Problems

Mon Jun 2 23:33:50 EDT 1997