        Next: Steiner Tree Up: Graph Problems: Hard Problems Previous: Edge Coloring

## Graph Isomorphism Input description: Two graphs, G and H.

Problem description: Find a (or all) mappings f of the vertices of G to the vertices of H such that G and H are identical; i.e. (x,y) is an edge of G iff (f(x),f(y)) is an edge of H.

Discussion: Isomorphism is the problem of testing whether two graphs are really the same.   Suppose we are given a collection of graphs and must perform some operation on each of them. If we can identify which of the graphs are duplicates, they can be discarded so as to avoid redundant work.

We need some terminology to settle what is meant when we say two graphs are the same. Two labeled graphs and are identical when iff .   The isomorphism problem consists of finding a mapping from the vertices of G to H such that they are identical. Such a mapping is called an isomorphism.

Identifying symmetries is another important application of graph isomorphism.   A mapping of a graph to itself is called an automorphism, and the collection of automorphisms (its automorphism group)   provides a great deal of information about symmetries in the graph. For example, the complete graph has n! automorphisms (any mapping will do), while an arbitrary random graph is likely to have few or perhaps only one, since G is always identical to itself.

Several variations of graph isomorphism arise in practice:

• Is graph G contained in (not identical to) graph H? - Instead of testing equality, we are often interested in knowing whether a small pattern graph G is a subgraph of H. Such problems as clique, independent set, and Hamiltonian cycle   are important special cases of subgraph isomorphism.

There are two distinct notions of ``contained in'' with respect to graphs. Subgraph isomorphism asks whether there is a subset of edges and vertices of G that is isomorphic to a smaller graph H.   Induced subgraph isomorphism asks whether there is a subset of vertices of G whose deletion leaves a subgraph isomorphic to a smaller graph H. For induced subgraph isomorphism, all edges of G must be present in H, but also all nonedges of G must be nonedges of H. Clique happens to be an instance of both subgraph problems, while Hamiltonian cycle is an example of vanilla subgraph isomorphism.

Be aware of this distinction in your application. Subgraph isomorphism problems tend to be harder than graph isomorphism, and induced subgraph problems tend to be even harder than subgraph isomorphism.   Backtracking is your only viable approach.

• Are your graphs labeled or unlabeled? - In many applications, vertices or edges of the graphs are labeled with some attribute that must be respected in determining isomorphisms.     For example, in comparing two bipartite graphs, each with ``worker'' vertices and ``job'' vertices, any isomorphism that equated a job with a worker would make no sense.

Labels and related constraints can be factored into any backtracking algorithm. Further, such constraints can be used to significantly speed up the search by creating more opportunities for pruning whenever two vertex labels do not match up.

• Are you testing whether two trees are isomorphic? -   There are faster algorithms for certain special cases of graph isomorphism, such as trees and planar graphs. Perhaps the most important case is detecting isomorphisms among trees, a problem that arises in language pattern matching and parsing applications.   A parse tree is often used to describe the structure of a text; two parse trees will be isomorphic if the underlying pair of texts have the same structure.

Efficient algorithms for tree isomorphism begin with the leaves of both trees and work inward towards the center. Each vertex in one tree is assigned a label representing the set of vertices in the second tree that might possibly be isomorphic to it, based on the constraints of labels and vertex degrees. For example, all the leaves in tree are initially potentially equivalent to all leaves of . Now, working inward, we can partition the vertices adjacent to leaves in into classes based on how many leaves and nonleaves they are adjacent to. By carefully keeping track of the labels of the subtrees, we can make sure that we have the same distribution of labeled subtrees for and . Any mismatch means , while completing the process partitions the vertices into equivalence classes defining all isomorphisms. See the references below for more details.

No polynomial-time algorithm is known for graph isomorphism, but neither is it known to be NP-complete.   Along with integer factorization (see Section ), it one of the few important algorithmic problems whose rough computational complexity is still not known.   The conventional wisdom is that isomorphism is a problem that lies between P and NP-complete if P NP.

Although no worst-case polynomial-time algorithm is known, testing isomorphism in practice is usually not very hard. The basic algorithm backtracks through all n! possible relabelings of the vertices of graph h with the names of vertices of graph g, and then tests whether the graphs are identical.   Of course, we can prune the search of all permutations with a given prefix as soon as we detect any mismatch between edges both of whose vertices are in the prefix.

However, the real key to efficient isomorphism testing is to preprocess the vertices into ``equivalence classes'', partitioning them   into sets of vertices such that two vertices in different sets cannot possibly be mistaken for each other. All vertices in each equivalence class must share the same value of some invariant that is independent of labeling.   Possibilities include:

• Vertex degree - This simplest way to partition vertices is based on their degree, the number of edges incident on the vertex.   Clearly, two vertices of different degree cannot be identical. This simple partition can often be a big win, but it won't do much for regular graphs, where each vertex has the same degree.
• Shortest path matrix - For each vertex v, the all-pairs shortest path matrix (see Section ) defines a multiset of n-1 distances   (possibly with repeats) representing the distances between v and each of the other vertices. Any two vertices that are identical in isomorphic graphs will define the exact same multiset of distances, so we can partition the vertices into equivalence classes defining identical distance multisets.
• Counting length-k paths - Taking the adjacency matrix of G and raising it to the kth power   gives a matrix where counts the number of (nonsimple) paths from i to j. For each vertex and each k, this matrix defines a multiset of path-counts, which can be used for partitioning as with distances above. You could try all or beyond, and use any single deviation as an excuse to partition.

Using these invariants, you should be able to partition the vertices of each graph into a large number of small equivalence classes. Finishing the job off with backtracking, using the name of each equivalence class as a label, should usually be quick work. If the sizes of the equivalence classes of both graphs are not identical, then the graphs cannot be isomorphic. It is harder to detect isomorphisms between graphs with high degrees of symmetry than it is for arbitrary graphs, because of the effectiveness of these equivalence-class partitioning heuristics.

Implementations: The world's fastest isomorphism testing program is Nauty, by Brendan D. McKay.   Nauty (No AUTomorphisms, Yes?) is a set of very efficient C language procedures for determining the automorphism group of a vertex-colored graph. Nauty is also able to produce a canonically labeled isomorph of the graph, to assist in isomorphism testing.   It was the basis of the first program to generate all the 11-vertex graphs without isomorphs, and can test most graphs of fewer than one hundred vertices in well under a second.   Nauty has been successfully ported to a variety of operating systems and C compilers. It may be obtained from http://cs.anu.edu.au/people/bdm/. It is free for educational and research applications, but for commercial use contact the author at bdm@cs.anu.edu.au.

Combinatorica [Ski90] provides (slow) Mathematica implementations of graph isomorphism and automorphism testing.     See Section for further information on Combinatorica.

Notes: Graph isomorphism is an important problem in complexity theory. Monographs on isomorphism detection include Hoffmann [Hof82].

Polynomial-time algorithms are known for planar graph isomorphism [HW74]   and for graphs where the maximum vertex degree is bounded by a constant [Luk80]. The all-pairs shortest path heuristic is due to [SD76], although there exist nonisomorphic graphs that realize the same set of distances [BH90]. A linear-time tree isomorphism algorithm for both labeled and unlabeled trees is presented in [AHU74].

A problem is said to be isomorphism-complete if it is provably as hard as isomorphism. Testing the isomorphism of bipartite graphs is   isomorphism-complete, since any graph can be made bipartite by replacing each edge by two edges connected with a new vertex. Clearly, the original graphs are isomorphic if and only if the transformed graphs are.

Related Problems: Shortest path (see page ), string matching (see page ).        Next: Steiner Tree Up: Graph Problems: Hard Problems Previous: Edge Coloring

Algorithms
Mon Jun 2 23:33:50 EDT 1997