Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Speech Recognition

Lecture 3: Automata and Transducer Algorithms

Mehryar Mohri
Courant Institute of Mathematical Sciences
mohri@cims.nyu.edu
This Lecture
Shortest-distance algorithms
Composition
Epsilon-removal
Determinization
Pushing
Minimization

Mehryar Mohri - Speech Recognition page 2 Courant Institute, NYU


Shortest-Distance Problem
Definition: for any regulated weighted transducer T ,
define the shortest distance from state q to F as
!
d(q, F ) = w[π].
π∈P (q,F )
Problem: compute d(q, F ) for all states q ∈ Q .
Algorithms:

• Generalization of Floyd-Warshall.
• Single-source shortest-distance algorithm.
Mehryar Mohri - Speech Recognition page 3 Courant Institute, NYU
Shortest-Distance Problem
Definition: for any regulated weighted transducer T ,
define the shortest distance from state q to F as
!
d(q, F ) = w[π].
π∈P (q,F )
Problem: compute d(q, F ) for all states q ∈ Q .
Algorithms:

• Generalization of Floyd-Warshall.
• Single-source shortest-distance algorithm.
Mehryar Mohri - Speech Recognition page 4 Courant Institute, NYU
All-Pairs Shortest-Distance Algorithm
(MM, 2002)
Assumption: closed semiring (not necessarily
idempotent).
Idea: generalization of Floyd-Warshall algorithm.
Properties:

• Time complexity: Ω(|Q| (T⊕ + T⊗ + T!)).


3

• Space complexity: Ω(|Q| ) with an in-place


2

implementation.

Mehryar Mohri - Speech Recognition page 5 Courant Institute, NYU


Closed Semirings
(Lehmann, 1977)
Definition: a semiring is closed if the closure is
well defined for all elements and if associativity,
commutativity, and distributivity apply to
countable sums.
Examples:

• Tropical semiring.
• Probability semiring when including infinity or
when restricted to well-defined closures.

Mehryar Mohri - Speech Recognition page 6 Courant Institute, NYU


Pseudocode
Generic-All-Pairs-Shortest-Distance (G)
1 for i ← 1 to |Q|
2 do for j ← 1 to |Q| !
3 do d[i, j] ← w[e]
e∈P (i,j)
4 for k ← 1 to |Q|
5 do for i ← 1 to |Q|
6 do for j ← 1 to |Q|
7 do d[i, j] ← d[i, j] ⊕ (d[i, k] ⊗ d[k, k]∗ ⊗ d[k, j])
8 for k ← 1 to |Q|
9 do d[k, k] ← 1
10 return d

Figure 7: Generic all-pairs shortest-distance algorithm.

Since K is closed for G, dk is well-defined and in view ofpageequations


Mehryar Mohri - Speech Recognition 7 Courant3 andNYU
Institute,
Single-Source Shortest-Distance Algorithm
(MM, 2002)
Assumption: k -closed semiring.
!
k+1 !k
∀x ∈ K, xi = xi .
i=0 i=0
Idea: generalization of relaxation, but must keep
track of weight added to d[q] since the last time q
was enqueued.
Properties:
• works with any queue discipline and any k-
closed semiring.
• Classical algorithms are special instances.
Mehryar Mohri - Speech Recognition page 8 Courant Institute, NYU
seen earlier, a straightforward extension of the relaxation technique would lead to
an algorithm that would not work with non-idempotent semirings. To deal properly
with multiplicities in the case of non-idempotent semirings, we keep track of the
Pseudocode
changes to the tentative shortest distance from s to q after the last extraction of q
from the queue. The following is the pseudocode of the algorithm.

Generic-Single-Source-Shortest-Distance (G, s)
1 for i ← 1 to |Q|
2 do d[i] ← r[i] ← 0
3 d[s] ← r[s] ← 1
4 S ← {s}
5 while S $= ∅
6 do q ← head(S)
7 Dequeue(S)
8 r" ← r[q]
9 r[q] ← 0
10 for each e ∈ E[q]
11 do if d[n[e]] $= d[n[e]] ⊕ (r" ⊗ w[e])
12 then d[n[e]] ← d[n[e]] ⊕ (r" ⊗ w[e])
13 r[n[e]] ← r[n[e]] ⊕ (r" ⊗ w[e])
14 if n[e] $∈ S
15 then Enqueue(S, n[e])
16 d[s] ← 1
Mehryar Mohri - Speech Recognition page 9 Courant Institute, NYU
Notes
Complexity:
Semiring Frameworks and Algorithms for Shortest-Distance Problems 1

• depends on queue discipline used. !


O(|Q| + (T⊕ + T⊗ + C(A))|E| max N (q) + (C(I) + C(E)) N (q))
q∈Q

• coincides with that of Dijkstra and Bellman-Ford


q∈Q

As mentioned
ing Frameworks andbefore, the Generic-Single-Source-Shortest-Distance
Algorithms for Shortest-Distance Problems algo
rithm works for withshortest-first and FIFO
any queue discipline. orders.
Some queue disciplines are better than oth


ers. The appropriate choice depends on the semiring K and the specific restriction
is: imposedlinear on G. With for acyclic
a good choice,graphs using topological
the maximum number of times order.
a vertex is in
serted in S (maxq∈Q N (q)) can be limited.9 The algorithm is then very efficient. I
O(|Q| + (T⊕ classical
the next sections, we will present some
+ T⊗ )|E|)
algorithms using the following queu
xt sections, Approximation:
disciplines: we will topological
examine order,
the - k -closed
!shortest-first
use semiring,
of theorder, first-in
generic e.g.,
first-out
algorithms
case, since the number of simple paths from s to q may be exponential in the size o
for
order. In the wors
just presented
us semirings. graphs
the graph
Thisinwill
(|Q| + |E|), probability
show how semiring.
the same algorithm can be used in diff
the complexity of the algorithm is exponential.
xts by just The modifying
results presented the underlying
in this section algebra.
can be extended to cover the case of non
commutative semirings [28]. Our framework can also be generalized by introducin
10
right
lassical and left
shortest-distance
Mehryar Mohri
semirings
- Speech Recognition
[28]. A right semiring is an algebraic
algorithms page 10
structure similar t
Courant Institute, NYU
This Lecture
Shortest-distance algorithms
Composition
Epsilon-removal
Determinization
Pushing
Minimization

Mehryar Mohri - Speech Recognition page 11 Courant Institute, NYU


Composition
Definition: given two weighted transducer T1 and T2
over a commutative semiring, the composed
transducer T = T1 ◦ T2 is defined by
!
(T1 ◦ T2 )(x, y) = T1 (x, z) ⊗ T2 (z, y).
Algorithm: z

• Epsilon-free case: matching transitions.


• General case: ! -filter transducer.
• Complexity: quadratic, O(|T1||T2|).
• On-demand construction.
Mehryar Mohri - Speech Recognition page 12 Courant Institute, NYU
Epsilon-Free Composition
States Q ⊆ Q1 × Q2 .
Initial states I = I1 × I2 .
Final states F = Q ∩ F1 × F2 .
Transitions
E = {((q1 , q1! ), a, c, w1 ⊗ w2 , (q2 , q2! )) :
(q1 , a, b, w1 , q2 ), (q1! , b, c, w2 , q2! ) ∈ Q}.

Mehryar Mohri - Speech Recognition page 13 Courant Institute, NYU


Illustration
Program: fsmcompose A.fsm B.fsm >C.fsm
fstcompose A.fsm B.fsm >C.fsm

a:a/0.6
b:a/0.2
b:b/0.3 2 a:b/0.5 a:b/0.3 2 b:a/0.5
a:b/0.1 b:b/0.1
0 1 b:b/0.4 3/0.7 0 1 3/0.6
a:b/0.4
a:b/0.2

a:a/.04 (0, 1)

a:a/.02 (3, 2)
a:b/.18
a:b/.01 b:a/.06 a:a/0.1
(0, 0) (1, 1) (2, 1) (3, 1)
a:b/.24

b:a/.08 (3, 3)/.42

Mehryar Mohri - Speech Recognition page 14 Courant Institute, NYU


,$-
!"!' !"!' !"!' !"!' !"!'

A' )
Redundant ε-Paths Problem
!"!
'
#"!(
(
$"!(
*
%"%
+

,%-
!#"! !#"! !#"! !#"! (MM et al. 1996)
T1 0 a:a!"% 1 b:! ! "&2 c:! %"!3 d:d 4 0 a:d 1 !:e 2 d:a 3 T2
!
B' ) ' ( *

!:!1 !:!1 !:!1 !:!1 !:!1 !2:! !2:! !2:! !2:!

T!1 0 a:a 1 b:!2 2 c:!2 3 d:d 4 0 a:d 1 !1: e 2 d:a 3 T!2


!"% !"&
).) ,/"/-
'.' ,!!"!!-
'.( !1:!1
#"! #"& #"! !2:!1 !1:!1
,!#"!#- ,!#"!!$ ,!#"!#- x:x 1
!"& x:x
(.' ,!!"!!-
(.(
0 F
$"! $"!
!2:!2 !2:!2
,!#"!#- ,!#"!#-
x:x 2
*.' !"& *.(
,!!"!!-
%"!
,/"/-

+.*
T = T!1 ◦ F ◦ T!2 .
Mehryar Mohri - Speech Recognition page 15 Courant Institute, NYU
Correctness of Filter
Proposition: filter F allows a unique path between
two states of the following grid.
a
!1 :!1 !1 :!1 !1:!1
(0,0) (1,0) (2,0)

!1:!1
b !2:!2 !2 :!1
!2 :!2
!2 :!1
!2 :!2 !2:!1
x:x
1
c x:x
!1 :!1 !1 :!1
(0,1) (1,1) (2,1) 0
!2:!2 !2:!2
!2 :!1 !2 :!1
!2 :!2 !2 :!2 !2 :!2 x:x 2
!1 :!1 !1 :!1
(0,2) (1,2) (2,2)

Proof: Observe
Fig. 3. (a) Redundantthat a Anecessary
!-paths. straightforwardand sufficient
generalization of the !-free case co
condition
all the pathsisfrom
that the
(0, 0) to (2,following
2) for example,sequences
(b) Filter transducer M allowing a unique !-path.
be
even when composing just two simple

forbidden: ab , ba , ac , and bc .
Mehryar Mohri - Speech Recognition page 16 Courant Institute, NYU
Correctness of Filter
Proof (cont.): Let σ = {a, b, c, x}, then set of
sequences forbidden is exactly
L = σ (ab + ba + ac + bc)σ .
∗ ∗

An automaton representing the complement can be


constructed by determ. and
x a complementation.
x
x
c
a
a x
c b a x c
x x a cb x b
x a ab a x
c x c b {0,1}b a c 1 cb a
x x x a c ab c x c b b
x b c b x x a ax 1 bc
c
c a b c b a c {0,1}
a {0,1}
c c a c 1x a b a
b x c c {0,3} xc
b b b x{0} {0,3} cb 3
a a ab 1 a c b b{0,3} c x0 c
a 1 0c {0} 0 b b3
a 3 b{0} b c x 3
a 1 bc a
b b c
a b0 c bx
c2 a
0 b a 3 2 c
a {0,2} b
0 b a 3 x x
2 c x
{0,2} a 2x a a
2 c {0,2} 2
non-coaccessible

A det(A) det(A)
Fig. 4. (a) Finite automaton A representing the set of disallowed sequences. (b) Auto
a)result
Finite
Mehryar automaton
ofMohri
the- Speech A representing
Recognition
determinization of A. the set ofaredisallowed
Subsets indicated sequences.
page 17
at each (b)(c)Automaton
Courant
state. AutomatonBC
Institute, NYU
Other Filters
(Pereira and Riley, 1997)

ε1:ε1
x:x ε2:ε2

ε2:ε2
0 1
x:x

Sequential Filter.

Mehryar Mohri - Speech Recognition page 18 Courant Institute, NYU


This Lecture
Shortest-distance algorithms
Composition
Epsilon-removal
Determinization
Pushing
Minimization

Mehryar Mohri - Speech Recognition page 19 Courant Institute, NYU


Epsilon-Removal
Definition: given weighted transducer T , create
equivalent weighted transducer with no epsilon-
transition.
Algorithm components:

• Computation
!
of the ε-closure at each state:
" #
C[p] = (q, d! [p, q]) : d! [p, q] != 0) with d! [p, q] = w[π].

• Removal of εs.
π∈P (p,!,q)

• On-demand construction.
Mehryar Mohri - Speech Recognition page 20 Courant Institute, NYU
Illustration
!/0.9
!/0.9
!/0.1
!/0.1
!/0.4
!/0.4 3 3
b/0.8b/0.8
b/0.6
b/0.6 a/0.5
a/0.5
a/0.1 !/0.2 !/0.8 a/0.1a/0.1
a/0.1 !/0.2 !/0.8 0 1/1.21/1.2
00 11 2 2 4/0.2
4/0.2 0
c/0.9c/0.9
c/0.7
c/0.7 b/0.3
b/0.3

(a)
(a) (b) (b)
b/0.8 a/1.3
a/1.3
a/0.1
a/0.1
a/0.1 b/0.3 1 1 b/0.5b/0.5
0 1/1.2 4/0.2 0 0 a/0.3
a/0.3 b/0.6
b/0.6
c/0.9 a/0.7 b/0.8 4/0 4/0
b/0.8 a/0.7
a/0.7
c/0.7
c/0.7 2
2 b/1.8
b/1.8

(b) (c)(c)

!-removalininthe
Figure 1:1: !-removal
Figure thetropical semiring.
tropical semiring.(a)(a)
Weighted
Weighted autom
aut
b/0.5
1 Mehryar !-transitions. (b)
!-transitions.
Mohri - Speech Recognition (b)Weighted
Weightedautomaton
automaton BBequivalent to to
equivalent
page 21 Courant A result
Institute,
A NYU of th
result o
Main Algorithm
Shortest-distance algorithms:

• closed semirings: generalization of Floyd-


Warshall algorithm.
• k-closed semirings: single-source shortest-
distance algorithm.
Complexity: shortest-distance and removal.

• Acyclic T! : O(|Q|2 + |Q||E|(T⊕ + T⊗ )).

• General case, tropical semiring:


2
O(|Q||E| + |Q| log |Q|).
Mehryar Mohri - Speech Recognition page 22 Courant Institute, NYU
This Lecture
Shortest-distance algorithms
Composition
Epsilon-removal
Determinization
Pushing
Minimization

Mehryar Mohri - Speech Recognition page 23 Courant Institute, NYU


Determinization
Definition: given weighted transducer T , create
equivalent non-deterministic weighted transducer.
Algorithm (weakly left divisible semirings):

• generalization of subset constructions to


weighted labeled subsets
{(q1 , x1 , w1 ), . . . , (qm , xm , wm )} .
• complexity: exponential, but lazy implementation.
• not all weighted transducers are determinizable
but all acyclic weighted transducers are. Test? For
some cases, using the twins property.
Mehryar Mohri - Speech Recognition page 24 Courant Institute, NYU
fficient Algorithms for Testing the Twins Property b/1 1/0 b/5 7
nization ofb/3Weighted Automata -
Illustration
b/3
Illustration
a/1
b/3
1 c/5 b/3
(0,0) a/1 (1,0),(2,1) c/5 (3,0)/0
0 1 c/5
d/6b/1 3/0
a/1
a/2 b/3 d/7
a/1 2 (0,0) a/1 {(1,2),(2,0)}/2 c/5
(1,0),(2,1) (3,0)/0
0 a/2 b/3 d/6 b/3 3/0 a/1 d/7b/1
b/4 2
0 2 3/0 {(0,0)} {(3,0)}/0
a/3 (a) b/3 b/1 (b) b/3
(a)
b/1Determinization
Figure 1: b/5 of a weighted automaton (b) the tropical semiring.
over
1/0 {(1,0),(2,3)}/0
Figure 1: Determinization of a weighted automaton over the tropical semiring.
y:b
y:b 3
y:a 3
Determinizability & Pre-Determinization !:b (.,!)
y:a (5,b)!:b Definitions
1
x:a {(1,2),(2,0)}/2 x:a (5,b) (.,!)
x:a 1 x:b x:a
a/1 x:b b/1
0 x:! y:a (3,b), (4,!)
} 0 x:! z:a 5 (0,!)
{(3,0)}/0 x:! (1,a),(2,!)y:a y:b
x:! 5 (0,!) (1,a),(2,!) y:b (3,b), (4,!)
b/1 z:a b/3
22 y:a
y:a z:a z:a
y:b
y:b
{(1,0),(2,3)}/0 44 (5,!)(5,!)

Mehryar Mohri - Speech Recognition


(a) (b)page 25 Courant Institute, NYU
Determinization ofIllustration
Weighted Transducers – Illustration
a:b/0.6
b:eps/0.4
Program: fsmdeterminize
a:a/0.1
2
T.fsm >D.fsm
b:b/0.3
a:b/0.2 a:eps/0.5
Graphical0 Representation: 3 4/0.7
a:c/0.5
c:eps/0.7
1

{(0, b, 0)}
a:a/0.2

a:b/0.1

a:eps/0.1 {(1, c, 0.4),


{(0, eps, 0)} b:a/0.3 a:b/0.6
(2, a, 0)}
{(3, b, 0),
(4, eps, .1)}
/(eps, 0.8) a:b/0.5

{(4, eps, 0)}


/(eps, 0.7)
c:c/1.1

Mehryar Mohri - Speech Recognition page 26 Courant Institute, NYU


2 .............

(a) (b)
Non-Determinizable Transducer
Figure 3: (a) Non-determinizable weighted automaton over the tropical semiring; states 1
nd 2 are non-twin siblings. (b) The first states created by determinization applied to the
utomaton of Figure (a). The algorithm does not halt and produces an infinite number of
tates in this case.

y:b (5,b) !:b (.,!)


3
y:a
x:a
x:a 1
x:b
0 (0,!) x:! (1,a),(2,b) y:! (3,ab),(4,ba) y:! (1,aba),(2,bab) ....
x:b 5
z:a z:b
2 y:a
y:b 4 (5,a) !:a (., !)

(a) (b)
Figure 4: (a) Non-determinizable functional transducer, or weighted automaton over the
tring semiring; states 1 and 2 are non-twin siblings. (b) Determinization does not halt and
reates an infinite number of states in this case.

x canMehryar Mohri - Speech Recognition


be viewed as the remainder weight at state q. page 27 Courant Institute, NYU
T has the twins property if:
Twins Property
for any paths π1 ∈ P (I, x, q), π2 ∈ P (I, x, q ! ), c1
c2 ∈ P (q ! , y, q ! ), (Choffrut, 1978; MM 1997)
Definition: a weighted transducer
o[π1 ]−1 o[πT ] over
= the
o[π c ]−1
o[π2 c
2 1 1
tropical semiring has the twins property if for any
w[c1 ]the=following
two states q and q as in the figure,
! w[c2 ]
holds: y:v/c

• c=c; !
x:u/w q

• u u = (uv) u v .
−1 " −1 " " I y:v’/c’

x:u’/w’ q’
u’ u’ v’
Theorem 1 If T has the twins property then T
u w u v w
(Choffrut, 1978; Mohri, 1997)
Mehryar Mohri - Speech Recognition page 28 Courant Institute, NYU
Determinizability
(Choffrut, 1978; MM 1997; Allauzen and MM, 2002)
Theorem: a trim unambiguous weighted
automaton over the tropical semiring is
determinizable iff it has the twins property.
Theorem: let T be a weighted transducer over the
tropical semiring. Then, if T has the twins
property, then it is determinizable.
Algorithm for testing the twins property:

• unambiguous automata: O(|Q| + |E| ) . 2 2

• unweighted transducers: O(|Q| (|Q| + |E| )).


2 2 2

Mehryar Mohri - Speech Recognition page 29 Courant Institute, NYU


This Lecture
Shortest-distance algorithms
Composition
Epsilon-removal
Determinization
Pushing
Minimization

Mehryar Mohri - Speech Recognition page 30 Courant Institute, NYU


Pushing
(MM,1997; 2004)
Definition: given weighted transducer T , create
equivalent weighted transducer such the sum
(longest common prefix) of the weights (output
strings) of all outgoing paths be 1 (ε) at all states,
modulo initial states.
Algorithm components:
• Single-source shortest-distance computation
!
d[q] = w[π].
π∈P (q,F )

• Reweighting: w[e] ← (d[p[e]])−1 (w[e] ⊗ d[n[e]]) for


each transition e .
Mehryar Mohri - Speech Recognition page 31 Courant Institute, NYU
Main Algorithm
Automata: single-source shortest-distance.

• acyclic case: O(|Q| + |E|(T + T )). ⊕ ⊗

• general case tropical semiring: O(|Q| log |Q| + |E|).


Semiring Frameworks and Algorithms for Shortest-Distance Problems 1

• general case k-closed semirings !


O(|Q| + (T⊕ + T⊗ + C(A))|E| max N (q) + (C(I) + C(E)) N (q))
q∈Q


q∈Q

general case closed semirings Ω(|Q| 3


(T⊕ + T⊗ + T! )). algo
As mentioned before, the Generic-Single-Source-Shortest-Distance
rithm works with any queue discipline. Some queue disciplines are better than oth
ers. The appropriate choice depends on the semiring K and the specific restriction
imposed Transducers:
on G. With a O((|P
good choice,| the
max + 1)
maximum
|E|). number of times a vertex is in
serted in S (maxq∈Q N (q)) can be limited.9 The algorithm is then very efficient. I
the next sections, we will present some classical algorithms using the following queu
disciplines: topological order, shortest-first order, first-in first-out
Mehryar Mohri - Speech Recognition page 32
order. In the wors
Courant Institute, NYU
Illustration
a/0
a/0 a/0
a/0
e/0
e/0 e/0
e/0
b/1
b/1 11 b/1
b/1 11
f/1
f/1 f/1
f/1
c/4
c/4 c/4
c/4
00 3/0 00 3/0
3/0
d/0
d/0 e/10
e/10 d/10
d/10 e/0
e/0

e/1
e/1 f/11
f/11 e/11
e/11 f/1
f/1
22 22
a/0.3266
a/0.3266
a/0
a/0 e/0.3132
e/0
e/0 b/1.3266
b/1.3266 1 1 e/0.3132
b/1
b/1 11 f/1.3132
f/1
f/1 f/1.3132
c/4
c/4 c/4.3266
c/4.3266
00 3/03/0 0 0 3/-0.6393/-0.63
d/0 e/10
e/10 e/0.3132
e/0.3132
d/10.326
d/10.326

e/1
e/1 f/11
f/11
22
f/1.3132
f/1.3132
e/11.326
e/11.326 2 2

Mehryar Mohri - Speech Recognition page 33 Courant Institute, NYU


Ilustration
a:a a:d
a:a b: ! 3 c:d a:d
b: ! 3 c:d
c: ! a: ! a: ! f:d
0 c: ! 1 a: ! 2 d: ! 4 a: ! 5 f:d 6
0 b:d 1 2 d: ! 4 5 6
b:d

a:a a:d
a:a b: ! 3 c: ! a:d
c:d a:d b: ! 3 c: ! a: ! f: !
0 1 2 d:d 4 5 6
c:d! a:d a: ! f: !
0 b: 1 2 4 5 6
d:d
b: !

Mehryar Mohri - Speech Recognition page 34 Courant Institute, NYU


This Lecture
Shortest-distance algorithms
Composition
Epsilon-removal
Determinization
Pushing
Minimization

Mehryar Mohri - Speech Recognition page 35 Courant Institute, NYU


Minimization (MM,1997, 2000, 2005)

Definition: given deterministic weighted


transducer T , create equivalent deterministic
weighted transducer with the minimal number of
states (and transitions).
Algorithm components:

• apply pushing to create canonical representation.


• apply unweighted automata minimization after
encoding (input labels, output label, weight) as a
single label.

Mehryar Mohri - Speech Recognition page 36 Courant Institute, NYU


Algorithm (MM,1997, 2000, 2005)

Automata: pushing and automata minimization,


general (Hopcroft,1971) and acyclic case (Revuz 1992).

• acyclic case: O(|Q| + |E|(T + T )). ⊕ ⊗

• general case tropical semiring: O(|E| log |Q|).


Transducers:

• acyclic case: O(S + |Q| + |E| (|P max | + 1)).

• general case tropical semiring:


O(S + |Q| + |E| (log |Q| + |Pmax |)).

Mehryar Mohri - Speech Recognition page 37 Courant Institute, NYU


Illustration
22
b/1
b/1 c/2
c/2

d/0
d/0 c/1
c/1
d/3
d/3 33 e/2
e/2
a/0
a/0 e/1
e/1
00 11 d/4
d/4 66 7/0
7/0
e/3
e/3
b/1
b/1
55
a/2
a/2 c/3
c/3

44

22
b/0
b/0 c/0
c/0

d/0
d/0 c/1
c/1
d/6
d/6 33 e/0
e/0
a/0
a/0 e/0
e/0
00 11 d/6
d/6 66 7/6
7/6
e/0
e/0
b/1
b/1
55
a/3
a/3 c/0
c/0

44
d/0
d/0
b/0
b/0
c/1
c/1
a/0
a/0 a/3
a/3 22 c/0
c/0
00 11
b/1
b/1 e/0
e/0 e/0
e/0
d/6
d/6 33 66 7/6
7/6

Mehryar Mohri - Speech Recognition page 38 Courant Institute, NYU


b:! 2 c:! a:DB
b:B c:C a:DB
d:CDB e:C f: !
a:ABCDB 1 d:D 3 e:D 6 f:BC 7

Illustration
a:A 1 3 6 7
0 c:!
0 b:CCDDB c:D
b:C b:!
4 b:C 5
4 5
2
b:B c:C a:DB
Fig. 6. Transducer T2 obtained by quasi-determinization from T .
Fig. 5. Sequential transducer T .
d:D e:D f:BC
a:A 1 3 6 7
This ST is0 not minimal
The application considered as
of quasi-determinization
c:D
an automaton.
leads to theThe application
transducer of the
T2 (figure
b:C
automata
(6)) which computes4 theleads
minimization sameto5function.
b:C the reduced transducer
Only represented
the output in figure
labels differ from
(7) which
those of T is
. the minimal ST as previously defined.
2 Fig. 5. Sequential transducer T .
Transducers are often usedb:!in both directions,
c:! a:DB from inputs to outputs and vice

The application
versa. of quasi-determinization
The first stage of quasi-determinization
d:CDB
leadsin to the transducer
e:C the minimization
f: !
Talgorithm
2 (figure
a:ABCDB 1 3 6 7
(6))also
has whichan computes
interesting the same
effect on function. Only
the reverse the output
application labels
of the differtrans-
minimal from
0 c:!
those ofIndeed,
ducer. T . b:CCDDB
since it reduces
b:! the ambiguities, the cost of matching a string
4 5
with the output strings of the transducer
2
is reduced.
b:! c:! a:DB
a:DB
Fig. 6. Transducer T2 b:!
obtained
d:CDB
2 by c:!
quasi-determinization
e:C f: !
from T .
a:ABCDB 1
a:ABCDB 3 e:C 6 f: ! 7
0 1 3 5 6
d:CDBc:!
This ST is0 not minimal considered as an automaton. The application of the
b:CCDDB
b:CCDDB
b:!
automata minimization4 leads to5 the reduced transducer represented in figure
(7) which is the minimal
Fig. 7.ST as previously
Minimal sequentialdefined.
transducer T . 3
Fig. 6. Transducer T2 obtained by quasi-determinization
Mehryar Mohri - Speech Recognition page 39 from .
TInstitute,
Courant NYU
References
• Cyril Allauzen and Mehryar Mohri. Efficient Algorithms for Testing the Twins Property.
Journal of Automata, Languages and Combinatorics, 8(2):117-144, 2003.

• John E. Hopcroft. An n log n algorithm for minimizing the states in a finite automaton. In
The Theory of Machines and Computations, pages 189-196. Academic Press, 1971.

• Mehryar Mohri. Finite-State Transducers in Language and Speech Processing. Computational


Linguistics, 23:2, 1997.

• Mehryar Mohri. Minimization Algorithms for Sequential Transducers. Theoretical


Computer Science, 234:177-201, March 2000.

• Mehryar Mohri. Semiring Frameworks and Algorithms for Shortest-Distance Problems.


Journal of Automata, Languages and Combinatorics, 7(3):321-350, 2002.

• Mehryar Mohri. Generic Epsilon-Removal and Input Epsilon-Normalization Algorithms for


Weighted Transducers. International Journal of Foundations of Computer Science, 13(1):
129-143, 2002.

• Mehryar Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied


Combinatorics on Words. Cambridge University Press, 2005.
Mehryar Mohri - Speech Recognition page 40 Courant Institute, NYU
References
• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Weighted Automata in Text and
Speech Processing. In Proceedings of the 12th biennial European Conference on Artificial
Intelligence (ECAI-96),Workshop on Extended finite state models of language. Budapest,
Hungary, 1996. John Wiley and Sons, Chichester.

• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. The Design Principles of a
Weighted Finite-State Transducer Library. Theoretical Computer Science, 231:17-32, January
2000.

• Mehryar Mohri and Michael Riley. A Weight Pushing Algorithm for Large Vocabulary
Speech Recognition. In Proceedings of the 7th European Conference on Speech
Communication and Technology (Eurospeech'01). Aalborg, Denmark, September 2001.

• Fernando Pereira and Michael Riley. Finite State Language Processing, chapter Speech
Recognition by Composition of Weighted Finite Automata. The MIT Press, 1997.

• Dominique Revuz. Minimisation of Acyclic Deterministic Automata in Linear Time.


Theoretical Computer Science 92(1): 181-189, 1992.

Mehryar Mohri - Speech Recognition page 41 Courant Institute, NYU

You might also like