# Kleene's algorithm

In theoretical computer science, in particular in formal language theory, **Kleene's algorithm** transforms a given deterministic finite automaton (DFA) into a regular expression.
Together with other conversion algorithms, it establishes the equivalence of several description formats for regular languages.

## Algorithm description[edit]

According to Gross and Yellen (2004),^{[1]} the algorithm can be traced back to Kleene (1956).^{[2]}

This description follows Hopcroft and Ullman (1979).^{[3]}
Given a deterministic finite automaton *M* = (*Q*, Σ, δ, *q*_{0}, *F*), with *Q* = { *q*_{0},...,*q*_{n} } its set of states, the algorithm computes

- the sets
*R*^{k}_{ij}of all strings that take*M*from state*q*_{i}to*q*_{j}without going through any state numbered higher than*k*.

Here, "going through a state" means entering *and* leaving it, so both *i* and *j* may be higher than *k*, but no intermediate state may.
Each set *R*^{k}_{ij} is represented by a regular expression; the algorithm computes them step by step for *k* = -1, 0, ..., *n*. Since there is no state numbered higher than *n*, the regular expression *R*^{n}_{0j} represents the set of all strings that take *M* from its start state *q*_{0} to *q*_{j}. If *F* = { *q*_{1},...,*q*_{f} } is the set of accept states, the regular expression *R*^{n}_{01} | ... | *R*^{n}_{0f} represents the language accepted by *M*.

The initial regular expressions, for *k* = -1, are computed as

*R*^{−1}_{ij}=*a*_{1}| ... |*a*_{m}if*i*≠*j*, where δ(*q*_{i},*a*_{1}) = ... = δ(*q*_{i},*a*_{m}) =*q*_{j}*R*^{−1}_{ij}=*a*_{1}| ... |*a*_{m}| ε, if*i*=*j*, where δ(*q*_{i},*a*_{1}) = ... = δ(*q*_{i},*a*_{m}) =*q*_{j}

After that, in each step the expressions *R*^{k}_{ij} are computed from the previous ones by

*R*^{k}_{ij}=*R*^{k-1}_{ik}(*R*^{k-1}_{kk})^{*}*R*^{k-1}_{kj}|*R*^{k-1}_{ij}

By induction on *k*, it can be shown that the length^{[4]} of each expression *R*^{k}_{ij} is at most 4^{k+1}(6*s*+7) - 4/3 symbols, where *s* denotes the number of characters in Σ.
Therefore, the length of the regular expression representing the language accepted by *M* is at most 4^{n+1}(6*s*+7)*f* - *f* - 3/3 symbols, where *f* denotes the number of final states.

## Example[edit]

The automaton shown in the picture can be described as *M* = (*Q*, Σ, δ, *q*_{0}, *F*) with

- the set of states
*Q*= {*q*_{0},*q*_{1},*q*_{2}}, - the input alphabet Σ = {
*a*,*b*}, - the transition function δ with δ(
*q*_{0},*a*)=*q*_{0}, δ(*q*_{0},*b*)=*q*_{1}, δ(*q*_{1},*a*)=*q*_{2}, δ(*q*_{1},*b*)=*q*_{1}, δ(*q*_{2},*a*)=*q*_{1}, and δ(*q*_{2},*b*)=*q*_{1}, - the start state
*q*_{0}, and - set of accept states
*F*= {*q*_{1}}.

Kleene's algorithm computes the initial regular expressions as

*R*^{−1}_{00}= *a*| ε*R*^{−1}_{01}= *b**R*^{−1}_{02}= ∅ *R*^{−1}_{10}= ∅ *R*^{−1}_{11}= *b*| ε*R*^{−1}_{12}= *a**R*^{−1}_{20}= ∅ *R*^{−1}_{21}= *a*|*b**R*^{−1}_{22}= ε

After that, the *R*^{k}_{ij} are computed from the *R*^{k-1}_{ij} step by step for *k* = 0, 1, 2.
Kleene algebra equalities are used to simplify the regular expressions as much as possible.

- Step 0

*R*^{0}_{00}= *R*^{−1}_{00}(*R*^{−1}_{00})^{*}*R*^{−1}_{00}|*R*^{−1}_{00}= ( *a*| ε)( *a*| ε)^{*}( *a*| ε)| *a*| ε= *a*^{*}*R*^{0}_{01}= *R*^{−1}_{00}(*R*^{−1}_{00})^{*}*R*^{−1}_{01}|*R*^{−1}_{01}= ( *a*| ε)( *a*| ε)^{*}*b*| *b*= *a*^{*}*b**R*^{0}_{02}= *R*^{−1}_{00}(*R*^{−1}_{00})^{*}*R*^{−1}_{02}|*R*^{−1}_{02}= ( *a*| ε)( *a*| ε)^{*}∅ | ∅ = ∅ *R*^{0}_{10}= *R*^{−1}_{10}(*R*^{−1}_{00})^{*}*R*^{−1}_{00}|*R*^{−1}_{10}= ∅ ( *a*| ε)^{*}( *a*| ε)| ∅ = ∅ *R*^{0}_{11}= *R*^{−1}_{10}(*R*^{−1}_{00})^{*}*R*^{−1}_{01}|*R*^{−1}_{11}= ∅ ( *a*| ε)^{*}*b*| *b*| ε= *b*| ε*R*^{0}_{12}= *R*^{−1}_{10}(*R*^{−1}_{00})^{*}*R*^{−1}_{02}|*R*^{−1}_{12}= ∅ ( *a*| ε)^{*}∅ | *a*= *a**R*^{0}_{20}= *R*^{−1}_{20}(*R*^{−1}_{00})^{*}*R*^{−1}_{00}|*R*^{−1}_{20}= ∅ ( *a*| ε)^{*}( *a*| ε)| ∅ = ∅ *R*^{0}_{21}= *R*^{−1}_{20}(*R*^{−1}_{00})^{*}*R*^{−1}_{01}|*R*^{−1}_{21}= ∅ ( *a*| ε)^{*}*b*| *a*|*b*= *a*|*b**R*^{0}_{22}= *R*^{−1}_{20}(*R*^{−1}_{00})^{*}*R*^{−1}_{02}|*R*^{−1}_{22}= ∅ ( *a*| ε)^{*}∅ | ε = ε

- Step 1

*R*^{1}_{00}= *R*^{0}_{01}(*R*^{0}_{11})^{*}*R*^{0}_{10}|*R*^{0}_{00}= *a*^{*}*b*( *b*| ε)^{*}∅ | *a*^{*}= *a*^{*}*R*^{1}_{01}= *R*^{0}_{01}(*R*^{0}_{11})^{*}*R*^{0}_{11}|*R*^{0}_{01}= *a*^{*}*b*( *b*| ε)^{*}( *b*| ε)| *a*^{*}*b*= *a*^{*}*b*^{*}*b**R*^{1}_{02}= *R*^{0}_{01}(*R*^{0}_{11})^{*}*R*^{0}_{12}|*R*^{0}_{02}= *a*^{*}*b*( *b*| ε)^{*}*a*| ∅ = *a*^{*}*b*^{*}*ba**R*^{1}_{10}= *R*^{0}_{11}(*R*^{0}_{11})^{*}*R*^{0}_{10}|*R*^{0}_{10}= ( *b*| ε)( *b*| ε)^{*}∅ | ∅ = ∅ *R*^{1}_{11}= *R*^{0}_{11}(*R*^{0}_{11})^{*}*R*^{0}_{11}|*R*^{0}_{11}= ( *b*| ε)( *b*| ε)^{*}( *b*| ε)| *b*| ε= *b*^{*}*R*^{1}_{12}= *R*^{0}_{11}(*R*^{0}_{11})^{*}*R*^{0}_{12}|*R*^{0}_{12}= ( *b*| ε)( *b*| ε)^{*}*a*| *a*= *b*^{*}*a**R*^{1}_{20}= *R*^{0}_{21}(*R*^{0}_{11})^{*}*R*^{0}_{10}|*R*^{0}_{20}= ( *a*|*b*)( *b*| ε)^{*}∅ | ∅ = ∅ *R*^{1}_{21}= *R*^{0}_{21}(*R*^{0}_{11})^{*}*R*^{0}_{11}|*R*^{0}_{21}= ( *a*|*b*)( *b*| ε)^{*}( *b*| ε)| *a*|*b*= ( *a*|*b*)*b*^{*}*R*^{1}_{22}= *R*^{0}_{21}(*R*^{0}_{11})^{*}*R*^{0}_{12}|*R*^{0}_{22}= ( *a*|*b*)( *b*| ε)^{*}*a*| ε = ( *a*|*b*)*b*^{*}*a*| ε

- Step 2

*R*^{2}_{00}= *R*^{1}_{02}(*R*^{1}_{22})^{*}*R*^{1}_{20}|*R*^{1}_{00}= *a*^{*}*b*^{*}*ba*(( *a*|*b*)*b*^{*}*a*| ε)^{*}∅ | *a*^{*}= *a*^{*}*R*^{2}_{01}= *R*^{1}_{02}(*R*^{1}_{22})^{*}*R*^{1}_{21}|*R*^{1}_{01}= *a*^{*}*b*^{*}*ba*(( *a*|*b*)*b*^{*}*a*| ε)^{*}( *a*|*b*)*b*^{*}| *a*^{*}*b*^{*}*b*= *a*^{*}*b*(*a*(*a*|*b*) |*b*)^{*}*R*^{2}_{02}= *R*^{1}_{02}(*R*^{1}_{22})^{*}*R*^{1}_{22}|*R*^{1}_{02}= *a*^{*}*b*^{*}*ba*(( *a*|*b*)*b*^{*}*a*| ε)^{*}(( *a*|*b*)*b*^{*}*a*| ε)| *a*^{*}*b*^{*}*ba*= *a*^{*}*b*^{*}*b*(*a*(*a*|*b*)*b*^{*})^{*}*a**R*^{2}_{10}= *R*^{1}_{12}(*R*^{1}_{22})^{*}*R*^{1}_{20}|*R*^{1}_{10}= *b*^{*}*a*(( *a*|*b*)*b*^{*}*a*| ε)^{*}∅ | ∅ = ∅ *R*^{2}_{11}= *R*^{1}_{12}(*R*^{1}_{22})^{*}*R*^{1}_{21}|*R*^{1}_{11}= *b*^{*}*a*(( *a*|*b*)*b*^{*}*a*| ε)^{*}( *a*|*b*)*b*^{*}| *b*^{*}= ( *a*(*a*|*b*) |*b*)^{*}*R*^{2}_{12}= *R*^{1}_{12}(*R*^{1}_{22})^{*}*R*^{1}_{22}|*R*^{1}_{12}= *b*^{*}*a*(( *a*|*b*)*b*^{*}*a*| ε)^{*}(( *a*|*b*)*b*^{*}*a*| ε)| *b*^{*}*a*= ( *a*(*a*|*b*) |*b*)^{*}*a**R*^{2}_{20}= *R*^{1}_{22}(*R*^{1}_{22})^{*}*R*^{1}_{20}|*R*^{1}_{20}= (( *a*|*b*)*b*^{*}*a*| ε)(( *a*|*b*)*b*^{*}*a*| ε)^{*}∅ | ∅ = ∅ *R*^{2}_{21}= *R*^{1}_{22}(*R*^{1}_{22})^{*}*R*^{1}_{21}|*R*^{1}_{21}= (( *a*|*b*)*b*^{*}*a*| ε)(( *a*|*b*)*b*^{*}*a*| ε)^{*}( *a*|*b*)*b*^{*}| ( *a*|*b*)*b*^{*}= ( *a*|*b*) (*a*(*a*|*b*) |*b*)^{*}*R*^{2}_{22}= *R*^{1}_{22}(*R*^{1}_{22})^{*}*R*^{1}_{22}|*R*^{1}_{22}= (( *a*|*b*)*b*^{*}*a*| ε)(( *a*|*b*)*b*^{*}*a*| ε)^{*}(( *a*|*b*)*b*^{*}*a*| ε)| ( *a*|*b*)*b*^{*}*a*| ε= (( *a*|*b*)*b*^{*}*a*)^{*}

Since *q*_{0} is the start state and *q*_{1} is the only accept state, the regular expression *R*^{2}_{01} denotes the set of all strings accepted by the automaton.

## See also[edit]

- Floyd-Warshall algorithm — an algorithm on weighted graphs that can be implemented by Kleene's algorithm using a particular Kleene algebra
- Star height problem — what is the minimum stars' nesting depth of all regular expressions corresponding to a given DFA?
- Generalized star height problem — if a complement operator is allowed additionally in regular expressions, can the stars' nesting depth of Kleene's algorithm's output be limited to a fixed bound?
- Thompson's construction algorithm — transforms a regular expression to a finite automaton

## References[edit]

**^**Jonathan L. Gross and Jay Yellen, ed. (2004).*Handbook of Graph Theory*. Discrete Mathematics and it Applications. CRC Press. ISBN 1-58488-090-2. Here: sect.2.1, remark R13 on p.65**^**Kleene, Stephen C. (1956). "Representation of Events in Nerve Nets and Finite Automate" (PDF).*Automata Studies, Annals of Math. Studies*. Princeton Univ. Press.**34**. Here: sect.9, p.37-40**^**John E. Hopcroft, Jeffrey D. Ullman (1979).*Introduction to Automata Theory, Languages, and Computation*. Addison-Wesley. ISBN 0-201-02988-X. Here: Theorem 2.4, p.33-34**^**More precisely, the number of regular-expression symbols, "*a*_{i}", "ε", "|", "^{*}", "·"; not counting parantheses.