        Next: Shortest Path Up: Graph Problems: Polynomial-Time Previous: Topological Sorting

## Minimum Spanning Tree Input description: A graph G = (V,E) with weighted edges.

Problem description: The subset of of minimum weight forming a tree on V.

Discussion: The minimum spanning tree (MST) of a graph defines the cheapest subset of edges that keeps the graph in one connected component.     Telephone companies are particularly interested in minimum spanning trees, because the minimum spanning tree of a set of sites defines the wiring scheme that connects the sites using as little wire as possible.   It is the mother of all network design problems.

Minimum spanning trees prove important for several reasons:

• They can be computed quickly and easily, and they create a sparse subgraph that reflects a lot about the original graph.
• They provide a way to identify clusters in sets of points.     Deleting the long edges from a minimum spanning tree leaves connected components that define natural clusters in the data set, as shown in the output figure above.
• They can be used to give approximate solutions to hard problems such as Steiner tree and traveling salesman.
• As an educational tool, minimum spanning tree algorithms provide graphic evidence that greedy algorithms can give provably optimal solutions.

Two classical algorithms efficiently construct minimum spanning trees, namely Prim's and Kruskal's. Brief overviews of both algorithms are given below, with correctness arguments in Section . We refer the reader to the codes for implementation details.

Prim's algorithm starts with an arbitrary vertex v and ``grows'' a tree from it, repeatedly finding the lowest-cost edge that will link some new vertex into this tree.   During execution we will label each vertex as either in the tree, fringe - meaning there exists an edge from a tree vertex, or unseen - meaning the vertex is more than one edge away from the tree.

```

Prim(G)
Select an arbitrary vertex to start

While (there are fringe vertices)

select minimum-weight edge between tree and fringe

add the selected edge and vertex to the tree

```

This creates a spanning tree for any connected graph, since no cycle can be introduced via edges between tree and fringe vertices. That it is in fact a tree of minimum weight can be proven by contradiction, and the proof is in Section . With simple data structures, Prim's algorithm can be implemented in time.

Kruskal's algorithm is also greedy.   It starts with each vertex as a separate tree and merges these trees together by repeatedly adding the lowest cost edge that merges two distinct subtrees (i.e. does not create a cycle).

```

Kruskal(G)
Sort the edges in order of increasing weight

count=0

while (count < n-1) do

get next edge (v,w)

if (component (v) component(w))

component(v) = component(w)

```

The ``which component?'' tests are efficiently implemented using the union-find data structure of Section , to yield an algorithm.

Minimum spanning tree is only one of several spanning tree problems that arise in practice. The following questions will help you sort your way through them:

• Are the weights of all edges of your graph identical? -   Every spanning tree on n points contains exactly n-1 edges. Thus if your graph is unweighted, any spanning tree will be a minimum spanning tree.     Either breadth-first or depth-first search can be used to find a rooted spanning tree in linear time. Depth-first search trees tend to be long and thin, while breadth-first search trees better reflect the distance structure of the graph, as discussed in Section .
• Should I use Prim's or Kruskal's algorithm? - As described, Prim's algorithm runs in , while Kruskal's algorithm takes time. Thus Prim's algorithm is faster on dense graphs, while Kruskal's is faster on sparse graphs. Although Prim's algorithm can be implemented in time using more advanced data structures (in particular, Fibonacci heaps), this will not be worth the trouble unless you have extremely large, fairly sparse graphs.

I personally find Kruskal's algorithm easier to understand and implement than Prim's, but that is just a matter of taste.

• What if my input is points in the plane, instead of a graph? -   Geometric instances, comprising n points in d-dimensions, can be solved by constructing the complete distance graph in and then finding the MST of this complete graph. However, for points in two or even three dimensions, it can be more efficient to solve the geometric version of the problem directly. To find the minimum spanning tree of n points, first construct the Delaunay triangulation of these points (see Sections and ).   In two dimensions, this gives a graph with O(n) edges that contains all the edges of the minimum spanning tree of the point set. Running Kruskal's algorithm on this sparse graphs finishes the job in time.
• How do I find a spanning tree that avoids vertices of high degree? - Another common goal of spanning tree problems is to minimize the maximum degree, typically to minimize the fan out in an interconnection network.   Unfortunately, finding a spanning tree of maximum degree 2 is clearly NP-complete, since this is identical to the Hamiltonian path problem. Efficient algorithms are known, however, that construct spanning trees whose maximum degree at most one more than required. This is likely to suffice in practice. See the references below.

Implementations: Pascal implementations of Prim's, Kruskal's, and the Cheriton-Tarjan algorithm are provided in [MS91], along with extensive empirical analysis that shows that the implementation of Prim's algorithm with the appropriate priority queue is fastest on most graphs.   See Section .

The Stanford GraphBase (see Section ) contains implementations of four different minimum spanning tree algorithms, and the result of timing experiments suggesting that Kruskal's algorithm is best. The results are reported in terms of memory accesses (mems) instead of seconds, to make them independent of processor speed.

A C++ implementation of Kruskal's algorithm is provided in LEDA (see Section ). Alternative implementations of Prim's and Kruskal's algorithms are provided in Pascal [SDK83] and C++ [Sed92]. See Section Section . XTango (see Section )   includes an animation of both Prim's and Kruskal's algorithms.

Algorithm 479 [Pag74] and Algorithm 613 [HJS84] of the Collected Algorithms of the ACM are Fortran codes for minimum spanning tree, the former in an implementation of a point clustering algorithm.     They are available from Netlib (see Section ). A bare bones Fortran implementation is provided in [NW78], including the enumeration of all spanning trees. See Section .

Combinatorica [Ski90] provides Mathematica implementations of Kruskal's minimum spanning tree algorithm and quickly counting the number of spanning trees of a graph.     See Section .

Notes: Good expositions on Prim's [Pri57] and Kruskal's [Kru56] algorithms will appear in any textbook on algorithms, but include [Baa88, CLR90, Man89, Tar83]. The fastest implementations of Prim's and Kruskal's algorithms use Fibonacci heaps [FT87].   Expositions of faster algorithms for geometric instances include [PS85].

A recent breakthrough on the minimum spanning tree problem is the linear-time randomized algorithm of Karger, Klein, and Tarjan [KKT95].   Simplifications will be needed before this becomes the algorithm of choice. The history of the minimum spanning tree problem dates back at least to Boruvka, in 1926, and is presented in [GH85].   Interestingly, it is Boruvka's algorithm that serves as the foundation to the new randomized one.

Fürer and Raghavachari [FR94] give an algorithm that constructs a spanning tree whose maximum degree is almost minimized, indeed is at most one more than the lowest-degree spanning tree. The situation is analogous to Vizing's theorem for edge coloring, which also gives an approximation algorithm to within additive factor one.

Minimum spanning tree algorithms have an interpretation in terms of matroids, which are systems of subsets closed under inclusion, for which the maximum weighted independent set can be found using a greedy algorithm.   The connection between greedy algorithms and matroids was established by Edmonds [Edm71]. Expositions on the theory of matroids include [Law76, PS82].

Algorithms for generating spanning trees in order from minimum to maximum weight are presented in [Gab77]. Good expositions on the matrix-tree theorem, which counts the number of spanning trees of a graph, include [Eve79a].

Related Problems: Steiner tree (see page ), traveling salesman (see page ).        Next: Shortest Path Up: Graph Problems: Polynomial-Time Previous: Topological Sorting

Algorithms
Mon Jun 2 23:33:50 EDT 1997