FiniteState Automata
Nondeterminism versus Determinism in FiniteState Automata
FiniteState Automata and Type 3 Grammars
Type 3 Grammars and Regular Grammars
Regular Languages and Regular Expressions
The computations of programs are driven by their inputs. The outputs are just the results of the computations, and they have no influence on the course that the computations take. Consequently, it seems that much can be studied about finitestate transducers, or equivalently, about finitememory programs even when their outputs are ignored. The advantage of conducting a study of such strippeddown finitestate transducers is in the simplified argumentation that they allow.
A finitestate transducer whose output components are ignored is called a finitestate automaton. Formally, a finitestate automaton M is a tuple <Q, , , q_{0}, F>, where Q, , q_{0}, and F are defined as for finitestate transducers, and the transition table is a relation from Q × ( {}) to Q.
Transition diagrams similar to those used for representing finitestate transducers can also be used to represent finitestate automata. The only difference is that in the case of finitestate automata, an edge that corresponds to a transition rule (p, , p) is labeled by the string .
Example 2.3.1 The finitestate automaton that is induced by the finitestate transducer of Figure 2.2.2 is <Q, , , q_{0}, F>, where Q = {q_{0}, q_{1}}, = {a, b}, = {(q_{0}, a, q_{1}), (q_{0}, b, q_{1}), (q_{1}, b, q_{1}), (q_{1}, a, q_{0})}, and F = {q_{0}}.
The transition diagram in Figure 2.3.1 represents the finitestate automaton.

The finitestate automaton M is said to be deterministic if, for each state q in Q and for each input symbol a in , the union (q, a) (q, ) is a multiset that contains at most one element. The finitestate automaton is said to be nondeterministic if it is not a deterministic finitestate automaton.
A transition rule (q, , p) of the finitestate automaton is said to be an transition rule if = . A finitestate automaton with no transition rules is said to be an free finitestate automaton.
Example 2.3.2 Consider the finitestate automaton M_{1} = <{q_{0}, . . . , q_{6}}, {0, 1}, {(q_{0}, 0, q_{0}), (q_{0}, , q_{1}), (q_{0}, , q_{4}), (q_{1}, 0, q_{2}), (q_{1}, 1, q_{1}), (q_{2}, 0, q_{3}), (q_{2}, 1, q_{2}), (q_{3}, 0, q_{3}), (q_{3}, 1, q_{1}), (q_{4}, 0, q_{4}), (q_{4}, 1, q_{5}), (q_{5}, 0, q_{5}), (q_{5}, 1, q_{6}), (q_{6}, 1, q_{6}), (q_{6}, 0, q_{4})}, q_{0}, {q_{0}, q_{3}, q_{6}}>. The transition diagram of M_{1} is given in Figure 2.3.2.

M_{1} is nondeterministic owing to the transition rules that originate at state q_{0}. One of the transition rules requires that an input value be read, whereas the other two transition rules require that no input value be read. Moreover, M_{1} is also nondeterministic when the transition rule (q_{0}, 0, q_{0}) is ignored, because M_{1} cannot determine locally which of the other transition rules to follow on the moves that originate at state q_{0}.
The finitestate automaton M_{2} in Figure 2.3.3 is a deterministic finitestate automaton.

M_{1} has two transition rules, and M_{2} has one.
A configuration , or an instantaneous description, of the finitestate automaton is a singleton uqv, where q is a state in Q, and uv is a string in *. The configuration is said to be an initial configuration if u = and q is the initial state. The configuration is said to be an accepting , or final, configuration if v = and q is an accepting state. With no loss of generality it is assumed that Q and are mutually disjoint.
Other definitions, like those of _{ M }, , _{ M }*, *, and acceptance, recognition, and decidability of a language by a finitestate automaton, are similar to those given for finitestate transducers.
Nondeterminism versus Determinism in FiniteState Automata
By the following theorem, nondeterminism does not add to the recognition power of finitestate automata, even though it might add to their succinctness. The proof of the theorem provides an algorithm for constructing, from any given nstate finitestate automaton, an equivalent deterministic finitestate automaton of at most 2^{n} states.
Theorem 2.3.1 If a language is accepted by a finitestate automaton, then it is also decided by a deterministic finitestate automaton that has no transition rules.
Proof Consider any finitestate automaton M = <Q, , , q_{0}, F>. Let A_{x} denote the set of all the states that M can reach from its initial state q_{0}, by the sequences of moves that consume the string x, that is, the set { q  q_{0}x * xq }. Then an input w is accepted by M if and only if A_{w} contains an accepting state.
The proof relies on the observation that A_{xa} contains exactly those states that can be reached from the states in A_{x}, by the sequences of transition rules that consume a, that is, A_{xa} = { p  q is in A_{x}, and qa * ap }.
Specifically, if p is a state in A_{xa}, then by definition there is a sequence of transition rules _{1}, . . . , _{t} that takes M from the initial state q_{0} to state p while consuming xa. This sequence must have a prefix _{1}, . . . , _{i} that takes M from q_{0} to some state q while consuming x (see Figure 2.3.4(a)).

On the other hand, if q is in A_{x} and if p is a state that is reachable from state q by a sequence _{1}, . . . , _{s} of transition rules that consumes a, then the state p is in A_{xa}. In such a case, if '_{1}, . . . , '_{r} is a sequence of transition rules that takes M from the initial state q_{0} to state q while consuming x, then M can reach the state p from state q_{0} by the sequence '_{1}, . . . , '_{r}, _{1}, . . . , _{s} of transition rules that consumes xa (see Figure 2.3.4(b)).
As a result, to determine if a_{1} a_{n} is accepted by M, one needs only to follow the sequence A_{}, A_{a1}, A_{a1a2}, . . . , A_{a1 an} of sets of states, where each A_{a1 ai+1} is uniquely determined from A_{a1 ai} and a_{i+1}. Therefore, a deterministic finitestate automaton M' of the following form decides the language that is accepted by M.
The set of states of M' is equal to { A  A is a subset of Q, and A = A_{x} for some x in * }. Since Q is finite, it follows that Q has only a finite number of subsets A, and consequently M' has also only a finite number of states. The initial state of M' is the subset of Q that is equal to A_{}. The accepting states of M' are those states of M' that contain at least one accepting state of M. The transition table of M' is the set { (A, a, A')  A and A' are states of M', a is in , and A' is the set of states that the finitestate automaton M can reach by consuming a from those states that are in A }.
By definition, M' has no transition rules. Moreover, M' is deterministic because, for each x in * and each a in , the set A_{xa} is uniquely defined from the set A_{x} and the symbol a.
Example 2.3.3 Let M be the finitestate automaton whose transition diagram is given in Figure 2.3.2. The transition diagram in Figure 2.3.5

A_{} is the set of all the states that M can reach without reading any input. q_{0} is in A_{} because it is the initial state of M. q_{1} and q_{2} are in A_{} because M has transition rules that leave the initial state q_{0} and enter states q_{1} and q_{2}, respectively.
A_{0} is the set of all the states that M can reach just by reading 0 from those states that are in A_{}. q_{0} is in A_{0} because q_{0} is in A_{} and M has the transition rule (q_{0}, 0, q_{0}). q_{1} is in A_{0} because q_{0} is in A_{} and M can use the pair (q_{0}, 0, q_{0}) and (q_{0}, , q_{1}) of transition rules to reach q_{1} from q_{0} just by reading 0. q_{2} is in A_{0} because q_{0} is in A_{} and M can use the pair (q_{0}, , q_{1}) and (q_{1}, 0, q_{2}) of transition rules to reach q_{2} from q_{0} just by reading 0.
The result of the last theorem cannot be generalized to finitestate transducers, because deterministic finitestate transducers can only compute functions, whereas nondeterministic finitestate transducers can also compute relations which are not functions, for example, the relation {(a, b), (a, c)}. In fact, there are also functions that can be computed by nondeterministic finitestate transducers but that cannot be computed by deterministic finitestate transducers. R = { (x0, 0^{x})  x is a string in {0, 1}* } { (x1, 1^{x})  x is a string in {0, 1}* } is an example of such a function. The function cannot be computed by a deterministic finitestate transducer because each deterministic finitestate transducer M satisfies the following condition, which is not shared by the function: if x_{1} is a prefix of x_{2} and M accepts x_{1} and x_{2}, then the output of M on input x_{1} is a prefix of the output of M on input x_{2} (Exercise 2.2.5).
FiniteState Automata and Type 3 Grammars
The following two results imply that a language is accepted by a finitestate automaton if and only if it is a Type 3 language. The proof of the first result shows how Type 3 grammars can simulate the computations of finitestate automata.
Theorem 2.3.2 Finitestate automata accept only Type 3 languages.
Proof Consider any finitestate automaton M = <Q, , , q_{0}, F>. By Theorem 2.3.1 it can be assumed that M is an free, finitestate automaton. With no loss of generality, it can also be assumed that no transition rule takes M to its initial state when that state is an accepting one. (If such is not the case, then one can add a new state q'_{0} to Q, make the new state q'_{0} both an initial and an accepting state, and add a new transition rule (q'_{0}, , q) to for each transition rule of the form (q_{0}, , q) that is in .)
Let G = <N, , P, [q_{0}]> be a Type 3 grammar, where N has a nonterminal symbol [q] for each state q in Q and P has the following production rules.
By induction on n it follows that a string a_{1}a_{2} a_{n} has a derivation in G of the form [q] a_{1}[q_{1}] a_{1}a_{2}[q_{2}] a_{1}a_{2} a_{n1}[q_{n1}] a_{1}a_{2} a_{n} if and only if M has a sequence of moves of the form qa_{1}a_{2} a_{n} a_{1}q_{1}a_{2} a_{n} a_{1}a_{2}q_{2}a_{3} a_{n} a_{1} a_{n1}q_{n1}a_{n} a_{1}a_{2} a_{n}q_{n} for some accepting state q_{n}. In particular the correspondence above holds for q = q_{0}. Therefore L(G) = L(M).
Example 2.3.4 The finitestate automaton M_{1}, whose transition diagram is given in Figure 2.3.6(b),

M_{1} is equivalent to the finitestate automaton M_{2}, whose transition diagram is given in Figure 2.3.6(a). The Type 3 grammar G = <N, , P, [q'_{0}]> generates the language L(M_{2}), if N = {[q'_{0}], [q_{0}], [q_{1}], [q_{2}]}, = {a, b}, and P consists of the following production rules.
The accepting computation q'_{0}abaa aq_{1}baa abq_{0}aa abaq_{1}a abaaq_{2} of M_{2} on input abaa is simulated by the derivation [q'_{0}] a[q_{1}] ab[q_{0}] aba[q_{1}] abaa of the grammar.
The production rule [q_{1}] a[q_{2}] can be eliminated from the grammar without affecting the generated language.
The next theorem shows that the converse of Theorem 2.3.2 also holds. The proof shows how finitestate automata can trace the derivations of Type 3 grammars.
Theorem 2.3.3 Each Type 3 language is accepted by a finitestate automaton.
Proof Consider any Type 3 grammar G = <N, , P, S>. The finitestate automaton M = <Q, , , q_{S}, F> accepts the language that G generates if Q, , q_{S}, and F are as defined below.
M has a state q_{A} in Q for each nonterminal symbol A in N. In addition, Q also has a distinguished state named q_{f}. The state q_{S} of M, which corresponds to the start symbol S, is designated as the initial state of M. The state q_{f} of M is designated to be the only accepting state of M, that is, F = {q_{f}}.
M has a transition rule in if and only if the transition rule corresponds to a production rule of G. Each transition rule of the form (q_{A}, a, q_{B}) in corresponds to a production rule of the form A aB in G. Each transition rule of the form (q_{A}, a, q_{f}) in corresponds to a production rule of the form A a in G. Each transition rule of the form (q_{S}, , q_{f}) in corresponds to a production rule of the form S in G.
The finitestate automaton M is constructed so as to trace the derivations of the grammar G in its computations. M uses its states to keep track of the nonterminal symbols in use in the sentential forms of G. M uses its transition rules to consume the input symbols that G generates in the direct derivations that use the corresponding production rules.
By induction on n, the constructed finitestate automaton M has a sequence q_{A0}x u_{1}q_{A1}v_{1} u_{2}q_{A2}v_{2} u_{n1}q_{An1}v_{n1} xq_{An} of n moves if and only if the grammar G has a derivation of length n of the form A_{0} u_{1}A_{1} u_{2}A_{2} u_{n1}A_{n1} x. In particular, such correspondence holds for A_{0} = S. Consequently, x is in L(M) if and only if it is in L(G).
Example 2.3.5 Consider the Type 3 grammar G = <{S, A, B}, {a, b}, P, S>, where P consists of the following transition rules.
The transition diagram in Figure 2.3.7

It turns out that finitestate automata and Type 3 grammars are quite similar mathematical systems. The states in the automata play a role similar to the nonterminal symbols in the grammars, and the transition rules in the automata play a role similar to the production rules in the grammars.
Type 3 Grammars and Regular Grammars
Type 3 grammars seem to be minimal in the sense that placing further meaningful restrictions on them results in grammars that cannot generate all the Type 3 languages. On the other hand, some of the restrictions placed on Type 3 grammars can be relaxed without increasing the class of languages that they can generate.
Specifically, a grammar G = <N, , P, S> is said to be a rightlinear grammar if each of its production rules is either of the form A xB or of the form A x, where A and B are nonterminal symbols in N and x is a string of terminal symbols in *.
The grammar is said to be a leftlinear grammar if each of its production rules is either of the form A Bx or of the form A x, where A and B are nonterminal symbols in N and x is a string of terminal symbols in *.
The grammar is said to be a regular grammar if it is either a rightlinear grammar or a leftlinear grammar. A language is said to be a regular language if it is generated by a regular grammar.
By Exercise 2.3.5 a language is a Type 3 language if and only if it is regular.
Regular Languages and Regular Expressions
Regular languages can also be defined, from the empty set and from some finite number of singleton sets, by the operations of union, composition, and Kleene closure. Specifically, consider any alphabet . Then a regular set over is defined in the following way.
Theorem 2.3.4 A set is a regular set if and only if it is accepted by a finitestate automaton.
Regular sets of the form Ø, {}, {a}, L_{} L_{}, L_{}L_{}, and L_{}* are quite often denoted by the expressions Ø, , a, () + (), ()(), and ()*, respectively. and are assumed to be the expressions that denote L_{} and L_{} in a similar manner, respectively. a is assumed to be a symbol from the alphabet. Expressions that denote regular sets in this manner are called regular expressions.
Some parentheses can be omitted from regular expressions, if a precedence relation between the operations of Kleene closure, composition, and union in the given order is assumed. The omission of parentheses in regular expressions is similar to that in arithmetic expressions, where closure, composition, and union in regular expressions play a role similar to exponentiation, multiplication, and addition in arithmetic expressions.
Example 2.3.6 The regular expression 0*(1*01*00*(11*01*00*)* + 0*10*11*(00*10*11*)*) denotes the language that is recognized by the finitestate automaton whose transition diagram is given in Figure 2.3.2. The expression indicates that each string starts with an arbitrary number of 0's. Then the string continues with a string in 1*01*00*(11*01*00*)* or with a string in 10*11*(00*10*11*)*. In the first case, the string continues with an arbitrary number of 1's, followed by 0, followed by an arbitrary number of 1's, followed by one or more 0's, followed by an arbitrary number of strings in 11*01*00*.
By the previous discussion, nondeterministic finitestate automata, deterministic finitestate automata, regular grammars, and regular expressions are all characterizations of the languages that finitememory programs accept. Moreover, there are effective procedures for moving between the different characterizations. These procedures provide the foundation for many systems that produce finitememorybased programs from characterizations of the previous nature. For instance, one of the best known systems, called LEX , gets inputs that are generalizations of regular expressions and provides outputs that are scanners. The advantage of such systems is obviously in the reduced effort they require for obtaining the desired programs.
Figure 2.3.8
