# Kleene's algorithm

In theoretical computer science, in particular in formal language theory, Kleene's algorithm transforms a given deterministic finite automaton (DFA) into a regular expression. Together with other conversion algorithms, it establishes the equivalence of several description formats for regular languages.

## Algorithm description

According to Gross and Yellen (2004), the algorithm can be traced back to Kleene (1956).

This description follows Hopcroft and Ullman (1979). Given a deterministic finite automaton M = (Q, Σ, δ, q0, F), with Q = { q0,...,qn } its set of states, the algorithm computes

the sets Rk
ij
of all strings that take M from state qi to qj without going through any state numbered higher than k.

Here, "going through a state" means entering and leaving it, so both i and j may be higher than k, but no intermediate state may. Each set Rk
ij
is represented by a regular expression; the algorithm computes them step by step for k = -1, 0, ..., n. Since there is no state numbered higher than n, the regular expression Rn
0j
represents the set of all strings that take M from its start state q0 to qj. If F = { q1,...,qf } is the set of accept states, the regular expression Rn
01
| ... | Rn
0f
represents the language accepted by M.

The initial regular expressions, for k = -1, are computed as

R−1
ij
= a1 | ... | am       if ij, where δ(qi,a1) = ... = δ(qi,am) = qj
R−1
ij
= a1 | ... | am | ε, if i=j, where δ(qi,a1) = ... = δ(qi,am) = qj

After that, in each step the expressions Rk
ij
are computed from the previous ones by

Rk
ij
= Rk-1
ik
(Rk-1
kk
)* Rk-1
kj
| Rk-1
ij

By induction on k, it can be shown that the length of each expression Rk
ij
is at most 4k+1(6s+7) - 4/3 symbols, where s denotes the number of characters in Σ. Therefore, the length of the regular expression representing the language accepted by M is at most 4n+1(6s+7)f - f - 3/3 symbols, where f denotes the number of final states.

## Example

The automaton shown in the picture can be described as M = (Q, Σ, δ, q0, F) with

• the set of states Q = { q0, q1, q2 },
• the input alphabet Σ = { a, b },
• the transition function δ with δ(q0,a)=q0,   δ(q0,b)=q1,   δ(q1,a)=q2,   δ(q1,b)=q1,   δ(q2,a)=q1, and δ(q2,b)=q1,
• the start state q0, and
• set of accept states F = { q1 }.

Kleene's algorithm computes the initial regular expressions as

 R−100 = a | ε R−101 = b R−102 = ∅ R−110 = ∅ R−111 = b | ε R−112 = a R−120 = ∅ R−121 = a | b R−122 = ε

After that, the Rk
ij
are computed from the Rk-1
ij
step by step for k = 0, 1, 2. Kleene algebra equalities are used to simplify the regular expressions as much as possible.

Step 0
 R000 = R−100 (R−100)* R−100 | R−100 = (a | ε) (a | ε)* (a | ε) | a | ε = a* R001 = R−100 (R−100)* R−101 | R−101 = (a | ε) (a | ε)* b | b = a* b R002 = R−100 (R−100)* R−102 | R−102 = (a | ε) (a | ε)* ∅ | ∅ = ∅ R010 = R−110 (R−100)* R−100 | R−110 = ∅ (a | ε)* (a | ε) | ∅ = ∅ R011 = R−110 (R−100)* R−101 | R−111 = ∅ (a | ε)* b | b | ε = b | ε R012 = R−110 (R−100)* R−102 | R−112 = ∅ (a | ε)* ∅ | a = a R020 = R−120 (R−100)* R−100 | R−120 = ∅ (a | ε)* (a | ε) | ∅ = ∅ R021 = R−120 (R−100)* R−101 | R−121 = ∅ (a | ε)* b | a | b = a | b R022 = R−120 (R−100)* R−102 | R−122 = ∅ (a | ε)* ∅ | ε = ε
Step 1
 R100 = R001 (R011)* R010 | R000 = a*b (b | ε)* ∅ | a* = a* R101 = R001 (R011)* R011 | R001 = a*b (b | ε)* (b | ε) | a* b = a* b* b R102 = R001 (R011)* R012 | R002 = a*b (b | ε)* a | ∅ = a* b* ba R110 = R011 (R011)* R010 | R010 = (b | ε) (b | ε)* ∅ | ∅ = ∅ R111 = R011 (R011)* R011 | R011 = (b | ε) (b | ε)* (b | ε) | b | ε = b* R112 = R011 (R011)* R012 | R012 = (b | ε) (b | ε)* a | a = b* a R120 = R021 (R011)* R010 | R020 = (a | b) (b | ε)* ∅ | ∅ = ∅ R121 = R021 (R011)* R011 | R021 = (a | b) (b | ε)* (b | ε) | a | b = (a | b) b* R122 = R021 (R011)* R012 | R022 = (a | b) (b | ε)* a | ε = (a | b) b* a | ε
Step 2
 R200 = R102 (R122)* R120 | R100 = a*b*ba ((a|b)b*a | ε)* ∅ | a* = a* R201 = R102 (R122)* R121 | R101 = a*b*ba ((a|b)b*a | ε)* (a|b)b* | a* b* b = a* b (a (a | b) | b)* R202 = R102 (R122)* R122 | R102 = a*b*ba ((a|b)b*a | ε)* ((a|b)b*a | ε) | a* b* ba = a* b* b (a (a | b) b*)* a R210 = R112 (R122)* R120 | R110 = b* a ((a|b)b*a | ε)* ∅ | ∅ = ∅ R211 = R112 (R122)* R121 | R111 = b* a ((a|b)b*a | ε)* (a|b)b* | b* = (a (a | b) | b)* R212 = R112 (R122)* R122 | R112 = b* a ((a|b)b*a | ε)* ((a|b)b*a | ε) | b* a = (a (a | b) | b)* a R220 = R122 (R122)* R120 | R120 = ((a|b)b*a | ε) ((a|b)b*a | ε)* ∅ | ∅ = ∅ R221 = R122 (R122)* R121 | R121 = ((a|b)b*a | ε) ((a|b)b*a | ε)* (a|b)b* | (a | b) b* = (a | b) (a (a | b) | b)* R222 = R122 (R122)* R122 | R122 = ((a|b)b*a | ε) ((a|b)b*a | ε)* ((a|b)b*a | ε) | (a | b) b* a | ε = ((a | b) b* a)*

Since q0 is the start state and q1 is the only accept state, the regular expression R2
01
denotes the set of all strings accepted by the automaton.