Next: Calendrical Calculations Up: Combinatorial Problems Previous: Generating Partitions

## Generating Graphs

Input description: Parameters describing the desired graph, such as the number of vertices n, the number of edges m, or the edge probability p.

Problem description: Generate (1) all or (2) a random or (3) the next graph satisfying the parameters.

Discussion: Graph generation typically arises in constructing test data for programs.   Perhaps you have two different programs that solve the same problem, and you want to see which one is faster or make sure that they always give the same answer.   Another application is experimental graph theory, verifying whether a particular property is true for all graphs or how often it is true.   It is much easier to conjecture the four-color theorem once you have demonstrated 4-colorings for all planar graphs on 15 vertices.

A different application of graph generation arises in network design.   Suppose you need to design a network linking ten machines using as few cables as possible, such that the network can survive up to two vertex failures. One approach is to test all the networks with a given number of edges until you find one that will work. For larger graphs, more heuristic approaches, like simulated annealing, will likely be necessary.

Many factors complicate the problem of generating graphs. First, make sure you know what you want to generate:

• Do I want labeled or unlabeled graphs? -     The issue here is whether the names of the vertices matter in deciding whether two graphs are the same. In generating labeled graphs, we seek to construct all possible labelings of all possible graph topologies. In generating unlabeled graphs, we seek only one representative for each topology and ignore labelings. For example, there are only two connected unlabeled graphs on three vertices - a triangle and a simple path. However, there are four connected labeled graphs on three vertices - one triangle and three 3-vertex paths, each distinguished by their central vertex. In general, labeled graphs are much easier to generate. However, there are so many more of them that you quickly get swamped with isomorphic copies of the same few graphs.
• What do I mean by random? -   There are two primary models of random graphs, both of which generate graphs according to different probability distributions. The first model is parameterized by a given edge probability p. Typically, p=0.5, although smaller values can be used to construct sparser random graphs. In this model a coin is flipped for each pair of vertices x and y to decide whether to add an edge (x,y). All labeled graphs will be generated with equal probability when p=1/2.

The second model is parameterized by the desired number of edges m. It selects m distinct edges uniformly at random. One way to do this is by drawing random (x,y)-pairs and creating an edge if that pair is not already in the graph.   An alternative approach to computing the same things constructs the set of possible edges and selects a random m-subset of them, as discussed in Section .

Which of these options best models your application? Probably none of them. Random graphs, by definition, have very little structure. In most applications, graphs are used to model relationships, which are often highly structured. Experiments conducted on random graphs, although interesting and easy to perform, often fail to capture what you are looking for.

An alternative to random graphs is to use ``organic'' graphs, graphs that reflect the relationships among real-world objects.     The Stanford GraphBase, discussed below, is an outstanding source of organic graphs. Further, there are many raw sources of relationships electronically available via the Internet that can be turned into interesting organic graphs with a little programming and imagination.   Consider the graph defined by a set of WWW pages, with any hyperlink between two pages defining an edge.   Or what about the graph implicit in railroad, subway, or airline networks, with vertices being stations and edges between two stations connected by direct service?   As a final example, every large computer program defines a call graph, where the vertices represent subroutines, and there is an edge (x,y) if x calls y.

Two special classes of graphs have generation algorithms that have proven particularly useful in practice:

• Trees -     Prüfer codes provide a simple way to rank and unrank labeled trees and thus solve all the standard generation problems discussed in Section .   There are exactly labeled trees on n vertices, and exactly that many strings of length n-2 on the alphabet .

The key to Prüfer's bijection is the observation that every tree has at least two vertices of degree 1.   Thus in any labeled tree, the vertex v incident on the leaf with lowest label is well-defined. We take v to be , the first character in the code. We then delete the associated leaf and repeat the procedure until only two vertices are left. This defines a unique code S for any given labeled tree that can be used to rank the tree. To go from code to tree, observe that the degree of vertex v in the tree is one more than the number of times v occurs in S. The lowest-labeled leaf will be the smallest integer missing from S, which when paired with determines the first edge of the tree. The entire tree follows by induction.

Algorithms for efficiently generating unlabeled rooted trees are presented in the implementation section below.

• Fixed degree sequence graphs -     The degree sequence of a graph G is an integer partition where is the degree of the ith highest-degree vertex of G.    Since each edge contributes to the degree of two vertices, p is a partition of 2m, where m is the number of edges in G.

Not all partitions correspond to degree sequences of graphs. However, there is a recursive construction that constructs a graph with a given degree sequence if one exists. If a partition is realizable, the highest-degree vertex can be connected to the next highest-degree vertices in G, or the vertices corresponding to parts . Deleting and decrementing yields a smaller partition, which we recur on. If we terminate without ever creating negative numbers, the partition was realizable. Since we always connect the highest-degree vertex to other high-degree vertices, it is important to reorder the parts of the partition by size after each iteration.

Although this construction is deterministic, a semirandom collection of graphs realizing this degree sequence   can be generated from G using edge-flipping operations. Suppose edges (x,y) and (w,z) are in G, but (x,w) and (y,z) are not. Exchanging these pairs of edges creates a different (not necessarily connected) graph without changing the degrees of any vertex.

Implementations: The Stanford GraphBase [Knu94]   is perhaps most useful as an instance generator for constructing a wide variety of graphs to serve as test data for other programs. It incorporates graphs derived from interactions of characters in famous novels, Roget's Thesaurus, the Mona Lisa, expander graphs, and the economy of the United States. It also contains routines for generating binary trees, graph products, line graphs, and other operations on basic graphs.   Finally, because of its machine-independent random number generators, it provides a way to construct random graphs such that they can be reconstructed elsewhere, thus making them perfect for experimental comparisons of algorithms. See Section for additional information.

Combinatorica [Ski90] provides Mathematica generators for basic graphs such as stars, wheels, complete graphs, random graphs and trees, and graphs with a given degree sequence.     Further, it includes operations to construct more interesting graphs from these, including join, product, and line graph.   Graffiti [Faj87], a collection of almost 200 graphs of graph-theoretic interest, are available in Combinatorica format. See Section .

The graph isomorphism testing program nauty (see Section ),     by Brendan D. McKay of the Australian National University, has been used to generate catalogs of all nonisomorphic graphs with up to 11 vertices.   This extension to nauty, named makeg, can be obtained by anonymous ftp from bellatrix.anu.edu.au (150.203.23.14) in the directory pub/nauty19.

Nijenhuis and Wilf [NW78] provide efficient Fortran routines to enumerate all labeled trees via Prüfer codes and to construct random unlabeled rooted trees. See Section .     A collection of generators for standard families of graphs is included with LEDA (see Section ).

Notes: An extensive literature exists on generating graphs uniformly at random. Surveys include [Gol93, Tin90]. Closely related to the problem of generating classes of graphs is counting them.   Harary and Palmer [HP73] survey results in graphical enumeration.

Random graph theory is concerned with the properties of random graphs. Threshold laws in random graph theory define the edge density at which properties such as connectedness become highly likely to occur.   Expositions on random graph theory include [ES74, Pal85].

An integer partition is graphic if there exists a simple graph with that degree sequence.     Erdős and Gallai [EG60] proved that a degree sequence is graphic if and only if the sequence observes the following condition for each integer r < n:

The bijection between n-2 strings and labeled trees is due to Prüfer [Prü18].   Good expositions on this result include [Eve79a, NW78].

Related Problems: Generating permutations (see page ), graph isomorphism (see page ).

Next: Calendrical Calculations Up: Combinatorial Problems Previous: Generating Partitions

Algorithms
Mon Jun 2 23:33:50 EDT 1997