Input description: A deterministic finite automaton M.
Problem description: The smallest deterministic finite automaton M' such that M' behaves identically to M.
Discussion: Problems associated with constructing and minimizing finite state machines arise repeatedly in software and hardware design applications. Finite state machines are best thought of as pattern recognizers, and minimum-size machines correspond to recognizers that require less time and space. Complicated control systems and compilers are often built using finite state machines to encode the current state and associated actions, as well as the set of possible transitions to other states. Minimizing the size of this machine minimizes its cost.
Finite state machines are best thought of as edge-labeled directed graphs, where each vertex represents one of n states and each edge a transition from one state to the other on receipt of the alphabet symbol that labels the edge. The automaton above analyzes a given sequence of coin tosses, with the dark states signifying that an even number of heads have been observed. Automata can be represented using any graph data structure (see Section ), or by using an transition matrix, where is the size of the alphabet.
Finite state machines are often used to specify search patterns in the guise of regular expressions, which are patterns formed by and-ing, or-ing, and looping over smaller regular expressions. For example, the regular expression matches any string on (a,b,c) that begins and ends with an a (including a itself). The best way to test whether a string is described by a given regular expression (especially if many strings will be tested) is to construct the equivalent finite automaton and then simulate the machine on the string. See Section for alternative approaches to string matching.
We consider three different problems on finite automata:
Algorithms for minimizing the number of states in a deterministic finite automaton (DFA) appear in any book on automata theory. The basic approach works by partitioning the states into gross equivalence classes and then refining the partition. Initially, the states are partitioned into accepting, rejecting, and other classes. The transitions from each node branch to a given class on a given symbol. Whenever two states , t in the same class C branch to elements of different classes, the class C must be partitioned into two subclasses, one containing , the other containing t.
This algorithm makes a sweep though all the classes looking for a new partition, and repeating the process from scratch if it finds one. This yields an algorithm, since at most n-1 sweeps need ever be performed. The final equivalence classes correspond to the states in the minimum automaton. In fact, a more efficient, algorithm is known with available implementations discussed below.
In fact, any NFA can be mechanically converted to an equivalent DFA, which can then be minimized as above. However, converting an NFA to a DFA can cause an exponential blowup in the number of states, which perversely might later be eliminated in the minimization. This exponential blowup makes automaton minimization problems NP-hard whenever you do not start with a DFA.
The proofs of equivalence between NFAs, DFAs, and regular expressions are elementary enough to be covered in undergraduate automata theory classes. However, they are surprisingly nasty to implement, for reasons including but not limited to the exponential blowup of states. Implementations are discussed below.
The nondeterministic construction uses -moves, which are optional transitions that require no input to fire. On reaching a state with an -move, we must assume that the machine can be in either state. Using such -moves, it is straightforward to construct an automaton from a depth-first traversal of the parse tree of the regular expression. This machine will have O(m) states, if m is the length of the regular expression. Further, simulating this machine on a string of length n takes O(m n) time, since we need consider each state/prefix pair only once.
The deterministic construction starts with the parse tree for the regular expression, observing that each leaf represents one of the alphabet symbols in the pattern. After recognizing a prefix of the text, we can be left in some subset of these possible positions, which would correspond to a state in the finite automaton. The derivatives method builds up this automaton state by state as it is needed. Even so, some regular expressions of length m require states in any DFA implementing them, such as . There is no way to avoid this exponential blowup in the space required. Note, however, that it takes linear time to simulate an input string on any automaton, regardless of the size of the automaton.
Implementations: FIRE Engine is a finite automaton toolkit, written in C++ by Bruce Watson. It provides production-quality implementations of finite automata and regular expression algorithms. Several finite automaton minimization algorithms have been implemented, including Hopcroft's algorithm. Both deterministic and nondeterministic automata are supported. FIRE Engine has been used for compiler construction, hardware modeling, and computational biology applications. It is strictly a computing engine and does not provide a graphical user interface. FIRE Engine is available by anonymous ftp from ftp.win.tue.nl in the directory /pub/techreports/pi/watson.phd/. A greatly improved commercial version is available from www.RibbitSoft.com.
Grail is a C++ package for symbolic computation with finite automata and regular expressions, from Darrell Raymond and Derrick Wood. Grail enables one to convert between different machine representations and to minimize automata. It can handle machines with 100,000 states and dictionaries of 20,000 words. All code and documentation are accessible from the WWW site http://www.csd.uwo.ca/research/grail, as well as pointers to a variety of other automaton packages. Commercial use of Grail is not allowed without approval, although it is freely available to students and educators.
An implementation in C of a regular-expression matching algorithm appears in [BR95]. The source code for this program is printed in the text and is available on disk for a modest fee. A bare bones implementation in C of a regular-expression pattern matching algorithm appears in [GBY91]. See Section .
XTango (see Section ) includes a simulation of a DFA. Many of the other animations (but not this one) are interesting and quite informative to watch.
FLAP (Formal Languages and Automata Package) is a tool by Susan Rodger for drawing and simulating finite automata, pushdown automata, and Turing machines. Using FLAP, one can draw a graphical representation (transition diagram) of an automaton, edit it, and simulate the automaton on some input. FLAP was developed in C++ for X-Windows. See http://www.cs.duke.edu:80/ rodger/tools/tools.html.
Notes: Aho [Aho90] provides a good survey on algorithms for pattern matching, and a particularly clear exposition for the case where the patterns are regular expressions. The technique for regular expression pattern matching with -moves is due to Thompson [Tho68]. Other expositions on finite automaton pattern matching include [AHU74].
Hopcroft [Hop71] gave an optimal algorithm for minimizing the number of states in DFAs. The derivatives method of constructing a finite state machine from a regular expression is due to Brzozowski [Brz64] and has been expanded upon in [BS86]. Expositions on the derivatives method includes Conway []. Testing the equivalence of two nondeterministic finite state machines is PSPACE-complete [SM73].
Related Problems: Satisfiability (see page ). string matching (see page ).