Lecture 3 PDF

Speech Recognition
Lecture 3: Automata and Transducer Algorithms
Mehryar Mohri
Courant Institute of Mathematical Sciences
mohri@cims.nyu.edu
This Lecture
Shortest-distance algorithms
Composition
Epsilon-removal
Determinization
Pushing
Minimization
Mehryar Mohri - Speech Recognition page 2 Courant Institute, NYU

Shortest-Distance Problem
Definition: for any regulated weighted transducer T ,
define the shortest distance from state q to F as
!
d(q, F ) = w[π].
π∈P (q,F )
Problem: compute d(q, F ) for all states q ∈ Q .
Algorithms:
• Generalization of Floyd-Warshall.
• Single-source shortest-distance algorithm.
Shortest-Distance Problem
Definition: for any regulated weighted transducer T ,
define the shortest distance from state q to F as
!
d(q, F ) = w[π].
π∈P (q,F )
Problem: compute d(q, F ) for all states q ∈ Q .
Algorithms:
• Generalization of Floyd-Warshall.
• Single-source shortest-distance algorithm.
All-Pairs Shortest-Distance Algorithm
(MM, 2002)
Assumption: closed semiring (not necessarily
idempotent).
Idea: generalization of Floyd-Warshall algorithm.
Properties:
• Time complexity: Ω(|Q| (T⊕ + T⊗ + T!)).

3
• Space complexity: Ω(|Q| ) with an in-place

2
implementation.

Closed Semirings
(Lehmann, 1977)
Definition: a semiring is closed if the closure is
well defined for all elements and if associativity,
commutativity, and distributivity apply to
countable sums.
Examples:
• Tropical semiring.
• Probability semiring when including infinity or
when restricted to well-defined closures.

Pseudocode
Generic-All-Pairs-Shortest-Distance (G)
1 for i ← 1 to |Q|
2 do for j ← 1 to |Q| !
3 do d[i, j] ← w[e]
e∈P (i,j)
4 for k ← 1 to |Q|
5 do for i ← 1 to |Q|
6 do for j ← 1 to |Q|
7 do d[i, j] ← d[i, j] ⊕ (d[i, k] ⊗ d[k, k]∗ ⊗ d[k, j])
8 for k ← 1 to |Q|
9 do d[k, k] ← 1
10 return d
Figure 7: Generic all-pairs shortest-distance algorithm.
Since K is closed for G, dk is well-defined and in view ofpageequations

Mehryar Mohri - Speech Recognition 7 Courant3 andNYU
Institute,
Single-Source Shortest-Distance Algorithm
(MM, 2002)
Assumption: k -closed semiring.
!
k+1 !k
∀x ∈ K, xi = xi .
i=0 i=0
Idea: generalization of relaxation, but must keep
track of weight added to d[q] since the last time q
was enqueued.
Properties:
• works with any queue discipline and any k-
closed semiring.
• Classical algorithms are special instances.
seen earlier, a straightforward extension of the relaxation technique would lead to
an algorithm that would not work with non-idempotent semirings. To deal properly
with multiplicities in the case of non-idempotent semirings, we keep track of the
Pseudocode
changes to the tentative shortest distance from s to q after the last extraction of q
from the queue. The following is the pseudocode of the algorithm.
Generic-Single-Source-Shortest-Distance (G, s)
1 for i ← 1 to |Q|
2 do d[i] ← r[i] ← 0
3 d[s] ← r[s] ← 1
4 S ← {s}
5 while S $= ∅
6 do q ← head(S)
7 Dequeue(S)
8 r" ← r[q]
9 r[q] ← 0
10 for each e ∈ E[q]
11 do if d[n[e]] $= d[n[e]] ⊕ (r" ⊗ w[e])
12 then d[n[e]] ← d[n[e]] ⊕ (r" ⊗ w[e])
13 r[n[e]] ← r[n[e]] ⊕ (r" ⊗ w[e])
14 if n[e] $∈ S
15 then Enqueue(S, n[e])
16 d[s] ← 1
Notes
Complexity:
Semiring Frameworks and Algorithms for Shortest-Distance Problems 1
• depends on queue discipline used. !

O(|Q| + (T⊕ + T⊗ + C(A))|E| max N (q) + (C(I) + C(E)) N (q))
q∈Q
• coincides with that of Dijkstra and Bellman-Ford

q∈Q
As mentioned
ing Frameworks andbefore, the Generic-Single-Source-Shortest-Distance
Algorithms for Shortest-Distance Problems algo
rithm works for withshortest-first and FIFO
any queue discipline. orders.
Some queue disciplines are better than oth
•
ers. The appropriate choice depends on the semiring K and the specific restriction
is: imposedlinear on G. With for acyclic
a good choice,graphs using topological
the maximum number of times order.
a vertex is in
serted in S (maxq∈Q N (q)) can be limited.9 The algorithm is then very efficient. I
O(|Q| + (T⊕ classical
the next sections, we will present some
+ T⊗ )|E|)
algorithms using the following queu
xt sections, Approximation:
disciplines: we will topological
examine order,
the - k -closed
!shortest-first
use semiring,
of theorder, first-in
generic e.g.,
first-out
algorithms
case, since the number of simple paths from s to q may be exponential in the size o
for
order. In the wors
just presented
us semirings. graphs
the graph
Thisinwill
(|Q| + |E|), probability
show how semiring.
the same algorithm can be used in diff
the complexity of the algorithm is exponential.
xts by just The modifying
results presented the underlying
in this section algebra.
can be extended to cover the case of non
commutative semirings [28]. Our framework can also be generalized by introducin
10
right
lassical and left
shortest-distance
Mehryar Mohri
semirings
- Speech Recognition
[28]. A right semiring is an algebraic
algorithms page 10
structure similar t
Courant Institute, NYU
This Lecture
Composition
Epsilon-removal
Determinization
Pushing
Minimization

Composition
Definition: given two weighted transducer T1 and T2
over a commutative semiring, the composed
transducer T = T1 ◦ T2 is defined by
!
(T1 ◦ T2 )(x, y) = T1 (x, z) ⊗ T2 (z, y).
Algorithm: z
• Epsilon-free case: matching transitions.

• General case: ! -filter transducer.
• Complexity: quadratic, O(|T1||T2|).
• On-demand construction.
Epsilon-Free Composition
States Q ⊆ Q1 × Q2 .
Initial states I = I1 × I2 .
Final states F = Q ∩ F1 × F2 .
Transitions
E = {((q1 , q1! ), a, c, w1 ⊗ w2 , (q2 , q2! )) :
(q1 , a, b, w1 , q2 ), (q1! , b, c, w2 , q2! ) ∈ Q}.

Illustration
Program: fsmcompose A.fsm B.fsm >C.fsm
fstcompose A.fsm B.fsm >C.fsm
a:a/0.6
b:a/0.2
b:b/0.3 2 a:b/0.5 a:b/0.3 2 b:a/0.5
a:b/0.1 b:b/0.1
0 1 b:b/0.4 3/0.7 0 1 3/0.6
a:b/0.4
a:b/0.2
a:a/.04 (0, 1)
a:a/.02 (3, 2)
a:b/.18
a:b/.01 b:a/.06 a:a/0.1
(0, 0) (1, 1) (2, 1) (3, 1)
a:b/.24
b:a/.08 (3, 3)/.42

,$-
!"!' !"!' !"!' !"!' !"!'
A' )
Redundant ε-Paths Problem
!"!
'
#"!(
(
$"!(
*
%"%
+
,%-
!#"! !#"! !#"! !#"! (MM et al. 1996)
T1 0 a:a!"% 1 b:! ! "&2 c:! %"!3 d:d 4 0 a:d 1 !:e 2 d:a 3 T2
!
B' ) ' ( *
!:!1 !:!1 !:!1 !:!1 !:!1 !2:! !2:! !2:! !2:!
T!1 0 a:a 1 b:!2 2 c:!2 3 d:d 4 0 a:d 1 !1: e 2 d:a 3 T!2

!"% !"&
).) ,/"/-
'.' ,!!"!!-
'.( !1:!1
#"! #"& #"! !2:!1 !1:!1
,!#"!#- ,!#"!!$ ,!#"!#- x:x 1
!"& x:x
(.' ,!!"!!-
(.(
0 F
$"! $"!
!2:!2 !2:!2
,!#"!#- ,!#"!#-
x:x 2
*.' !"& *.(
,!!"!!-
%"!
,/"/-
+.*
T = T!1 ◦ F ◦ T!2 .
Correctness of Filter
Proposition: filter F allows a unique path between
two states of the following grid.
a
!1 :!1 !1 :!1 !1:!1
(0,0) (1,0) (2,0)
!1:!1
b !2:!2 !2 :!1
!2 :!2
!2 :!1
!2 :!2 !2:!1
x:x
1
c x:x
!1 :!1 !1 :!1
(0,1) (1,1) (2,1) 0
!2:!2 !2:!2
!2 :!1 !2 :!1
!2 :!2 !2 :!2 !2 :!2 x:x 2
!1 :!1 !1 :!1
(0,2) (1,2) (2,2)
Proof: Observe
Fig. 3. (a) Redundantthat a Anecessary
!-paths. straightforwardand sufficient
generalization of the !-free case co
condition
all the pathsisfrom
that the
(0, 0) to (2,following
2) for example,sequences
(b) Filter transducer M allowing a unique !-path.
be
even when composing just two simple
forbidden: ab , ba , ac , and bc .
Correctness of Filter
Proof (cont.): Let σ = {a, b, c, x}, then set of
sequences forbidden is exactly
L = σ (ab + ba + ac + bc)σ .
∗ ∗
An automaton representing the complement can be

constructed by determ. and
x a complementation.
x
x
c
a
a x
c b a x c
x x a cb x b
x a ab a x
c x c b {0,1}b a c 1 cb a
x x x a c ab c x c b b
x b c b x x a ax 1 bc
c
c a b c b a c {0,1}
a {0,1}
c c a c 1x a b a
b x c c {0,3} xc
b b b x{0} {0,3} cb 3
a a ab 1 a c b b{0,3} c x0 c
a 1 0c {0} 0 b b3
a 3 b{0} b c x 3
a 1 bc a
b b c
a b0 c bx
c2 a
0 b a 3 2 c
a {0,2} b
0 b a 3 x x
2 c x
{0,2} a 2x a a
2 c {0,2} 2
non-coaccessible
A det(A) det(A)
Fig. 4. (a) Finite automaton A representing the set of disallowed sequences. (b) Auto
a)result
Finite
Mehryar automaton
ofMohri
the- Speech A representing
Recognition
determinization of A. the set ofaredisallowed
Subsets indicated sequences.
page 17
at each (b)(c)Automaton
Courant
state. AutomatonBC
Institute, NYU
Other Filters
(Pereira and Riley, 1997)
ε1:ε1
x:x ε2:ε2
ε2:ε2
0 1
x:x
Sequential Filter.

This Lecture
Composition
Epsilon-removal
Determinization
Pushing
Minimization

Epsilon-Removal
Definition: given weighted transducer T , create
equivalent weighted transducer with no epsilon-
transition.
Algorithm components:
• Computation
!
of the ε-closure at each state:
" #
C[p] = (q, d! [p, q]) : d! [p, q] != 0) with d! [p, q] = w[π].
• Removal of εs.
π∈P (p,!,q)
• On-demand construction.
Illustration
!/0.9
!/0.9
!/0.1
!/0.1
!/0.4
!/0.4 3 3
b/0.8b/0.8
b/0.6
b/0.6 a/0.5
a/0.5
a/0.1 !/0.2 !/0.8 a/0.1a/0.1
a/0.1 !/0.2 !/0.8 0 1/1.21/1.2
00 11 2 2 4/0.2
4/0.2 0
c/0.9c/0.9
c/0.7
c/0.7 b/0.3
b/0.3
(a)
(a) (b) (b)
b/0.8 a/1.3
a/1.3
a/0.1
a/0.1
a/0.1 b/0.3 1 1 b/0.5b/0.5
0 1/1.2 4/0.2 0 0 a/0.3
a/0.3 b/0.6
b/0.6
c/0.9 a/0.7 b/0.8 4/0 4/0
b/0.8 a/0.7
a/0.7
c/0.7
c/0.7 2
2 b/1.8
b/1.8
(b) (c)(c)
!-removalininthe
Figure 1:1: !-removal
Figure thetropical semiring.
tropical semiring.(a)(a)
Weighted
Weighted autom
aut
b/0.5
1 Mehryar !-transitions. (b)
!-transitions.
Mohri - Speech Recognition (b)Weighted
Weightedautomaton
automaton BBequivalent to to
equivalent
page 21 Courant A result
Institute,
A NYU of th
result o
Main Algorithm
Shortest-distance algorithms:
• closed semirings: generalization of Floyd-

Warshall algorithm.
• k-closed semirings: single-source shortest-
distance algorithm.
Complexity: shortest-distance and removal.
• Acyclic T! : O(|Q|2 + |Q||E|(T⊕ + T⊗ )).
• General case, tropical semiring:

2
O(|Q||E| + |Q| log |Q|).
This Lecture
Composition
Epsilon-removal
Determinization
Pushing
Minimization

Determinization
equivalent non-deterministic weighted transducer.
Algorithm (weakly left divisible semirings):
• generalization of subset constructions to

weighted labeled subsets
{(q1 , x1 , w1 ), . . . , (qm , xm , wm )} .
• complexity: exponential, but lazy implementation.
• not all weighted transducers are determinizable
but all acyclic weighted transducers are. Test? For
some cases, using the twins property.
fficient Algorithms for Testing the Twins Property b/1 1/0 b/5 7
nization ofb/3Weighted Automata -
Illustration
b/3
Illustration
a/1
b/3
1 c/5 b/3
(0,0) a/1 (1,0),(2,1) c/5 (3,0)/0
0 1 c/5
d/6b/1 3/0
a/1
a/2 b/3 d/7
a/1 2 (0,0) a/1 {(1,2),(2,0)}/2 c/5
(1,0),(2,1) (3,0)/0
0 a/2 b/3 d/6 b/3 3/0 a/1 d/7b/1
b/4 2
0 2 3/0 {(0,0)} {(3,0)}/0
a/3 (a) b/3 b/1 (b) b/3
(a)
b/1Determinization
Figure 1: b/5 of a weighted automaton (b) the tropical semiring.
over
1/0 {(1,0),(2,3)}/0
Figure 1: Determinization of a weighted automaton over the tropical semiring.
y:b
y:b 3
y:a 3
Determinizability & Pre-Determinization !:b (.,!)
y:a (5,b)!:b Definitions
1
x:a {(1,2),(2,0)}/2 x:a (5,b) (.,!)
x:a 1 x:b x:a
a/1 x:b b/1
0 x:! y:a (3,b), (4,!)
} 0 x:! z:a 5 (0,!)
{(3,0)}/0 x:! (1,a),(2,!)y:a y:b
x:! 5 (0,!) (1,a),(2,!) y:b (3,b), (4,!)
b/1 z:a b/3
22 y:a
y:a z:a z:a
y:b
y:b
{(1,0),(2,3)}/0 44 (5,!)(5,!)
Mehryar Mohri - Speech Recognition

(a) (b)page 25 Courant Institute, NYU
Determinization ofIllustration
Weighted Transducers – Illustration
a:b/0.6
b:eps/0.4
Program: fsmdeterminize
a:a/0.1
2
T.fsm >D.fsm
b:b/0.3
a:b/0.2 a:eps/0.5
Graphical0 Representation: 3 4/0.7
a:c/0.5
c:eps/0.7
1
{(0, b, 0)}
a:a/0.2
a:b/0.1
a:eps/0.1 {(1, c, 0.4),

{(0, eps, 0)} b:a/0.3 a:b/0.6
(2, a, 0)}
{(3, b, 0),
(4, eps, .1)}
/(eps, 0.8) a:b/0.5
{(4, eps, 0)}

/(eps, 0.7)
c:c/1.1

2 .............
(a) (b)
Non-Determinizable Transducer
Figure 3: (a) Non-determinizable weighted automaton over the tropical semiring; states 1
nd 2 are non-twin siblings. (b) The first states created by determinization applied to the
utomaton of Figure (a). The algorithm does not halt and produces an infinite number of
tates in this case.
y:b (5,b) !:b (.,!)

3
y:a
x:a
x:a 1
x:b
0 (0,!) x:! (1,a),(2,b) y:! (3,ab),(4,ba) y:! (1,aba),(2,bab) ....
x:b 5
z:a z:b
2 y:a
y:b 4 (5,a) !:a (., !)
(a) (b)
Figure 4: (a) Non-determinizable functional transducer, or weighted automaton over the
tring semiring; states 1 and 2 are non-twin siblings. (b) Determinization does not halt and
reates an infinite number of states in this case.
x canMehryar Mohri - Speech Recognition

be viewed as the remainder weight at state q. page 27 Courant Institute, NYU
T has the twins property if:
Twins Property
for any paths π1 ∈ P (I, x, q), π2 ∈ P (I, x, q ! ), c1
c2 ∈ P (q ! , y, q ! ), (Choffrut, 1978; MM 1997)
Definition: a weighted transducer
o[π1 ]−1 o[πT ] over
= the
o[π c ]−1
o[π2 c
2 1 1
tropical semiring has the twins property if for any
w[c1 ]the=following
two states q and q as in the figure,
! w[c2 ]
holds: y:v/c
• c=c; !
x:u/w q
• u u = (uv) u v .
−1 " −1 " " I y:v’/c’
x:u’/w’ q’
u’ u’ v’
Theorem 1 If T has the twins property then T
u w u v w
(Choffrut, 1978; Mohri, 1997)
Determinizability
(Choffrut, 1978; MM 1997; Allauzen and MM, 2002)
Theorem: a trim unambiguous weighted
automaton over the tropical semiring is
determinizable iff it has the twins property.
Theorem: let T be a weighted transducer over the
tropical semiring. Then, if T has the twins
property, then it is determinizable.
Algorithm for testing the twins property:
• unambiguous automata: O(|Q| + |E| ) . 2 2
• unweighted transducers: O(|Q| (|Q| + |E| )).

2 2 2

This Lecture
Composition
Epsilon-removal
Determinization
Pushing
Minimization

Pushing
(MM,1997; 2004)
equivalent weighted transducer such the sum
(longest common prefix) of the weights (output
strings) of all outgoing paths be 1 (ε) at all states,
modulo initial states.
• Single-source shortest-distance computation
!
d[q] = w[π].
π∈P (q,F )
• Reweighting: w[e] ← (d[p[e]])−1 (w[e] ⊗ d[n[e]]) for

each transition e .
Main Algorithm
Automata: single-source shortest-distance.
• acyclic case: O(|Q| + |E|(T + T )). ⊕ ⊗
• general case tropical semiring: O(|Q| log |Q| + |E|).

Semiring Frameworks and Algorithms for Shortest-Distance Problems 1
• general case k-closed semirings !

O(|Q| + (T⊕ + T⊗ + C(A))|E| max N (q) + (C(I) + C(E)) N (q))
q∈Q
•
q∈Q
general case closed semirings Ω(|Q| 3

(T⊕ + T⊗ + T! )). algo
As mentioned before, the Generic-Single-Source-Shortest-Distance
rithm works with any queue discipline. Some queue disciplines are better than oth
ers. The appropriate choice depends on the semiring K and the specific restriction
imposed Transducers:
on G. With a O((|P
good choice,| the
max + 1)
maximum
|E|). number of times a vertex is in
serted in S (maxq∈Q N (q)) can be limited.9 The algorithm is then very efficient. I
the next sections, we will present some classical algorithms using the following queu
disciplines: topological order, shortest-first order, first-in first-out
Mehryar Mohri - Speech Recognition page 32
order. In the wors
Courant Institute, NYU
Illustration
a/0
a/0 a/0
a/0
e/0
e/0 e/0
e/0
b/1
b/1 11 b/1
b/1 11
f/1
f/1 f/1
f/1
c/4
c/4 c/4
c/4
00 3/0 00 3/0
3/0
d/0
d/0 e/10
e/10 d/10
d/10 e/0
e/0
e/1
e/1 f/11
f/11 e/11
e/11 f/1
f/1
22 22
a/0.3266
a/0.3266
a/0
a/0 e/0.3132
e/0
e/0 b/1.3266
b/1.3266 1 1 e/0.3132
b/1
b/1 11 f/1.3132
f/1
f/1 f/1.3132
c/4
c/4 c/4.3266
c/4.3266
00 3/03/0 0 0 3/-0.6393/-0.63
d/0 e/10
e/10 e/0.3132
e/0.3132
d/10.326
d/10.326
e/1
e/1 f/11
f/11
22
f/1.3132
f/1.3132
e/11.326
e/11.326 2 2

Ilustration
a:a a:d
a:a b: ! 3 c:d a:d
b: ! 3 c:d
c: ! a: ! a: ! f:d
0 c: ! 1 a: ! 2 d: ! 4 a: ! 5 f:d 6
0 b:d 1 2 d: ! 4 5 6
b:d
a:a a:d
a:a b: ! 3 c: ! a:d
c:d a:d b: ! 3 c: ! a: ! f: !
0 1 2 d:d 4 5 6
c:d! a:d a: ! f: !
0 b: 1 2 4 5 6
d:d
b: !

This Lecture
Composition
Epsilon-removal
Determinization
Pushing
Minimization

Minimization (MM,1997, 2000, 2005)
Definition: given deterministic weighted

transducer T , create equivalent deterministic
weighted transducer with the minimal number of
states (and transitions).
• apply pushing to create canonical representation.

• apply unweighted automata minimization after
encoding (input labels, output label, weight) as a
single label.

Algorithm (MM,1997, 2000, 2005)
Automata: pushing and automata minimization,

general (Hopcroft,1971) and acyclic case (Revuz 1992).
• acyclic case: O(|Q| + |E|(T + T )). ⊕ ⊗
• general case tropical semiring: O(|E| log |Q|).

Transducers:
• acyclic case: O(S + |Q| + |E| (|P max | + 1)).
• general case tropical semiring:

O(S + |Q| + |E| (log |Q| + |Pmax |)).

Illustration
22
b/1
b/1 c/2
c/2
d/0
d/0 c/1
c/1
d/3
d/3 33 e/2
e/2
a/0
a/0 e/1
e/1
00 11 d/4
d/4 66 7/0
7/0
e/3
e/3
b/1
b/1
55
a/2
a/2 c/3
c/3
44
22
b/0
b/0 c/0
c/0
d/0
d/0 c/1
c/1
d/6
d/6 33 e/0
e/0
a/0
a/0 e/0
e/0
00 11 d/6
d/6 66 7/6
7/6
e/0
e/0
b/1
b/1
55
a/3
a/3 c/0
c/0
44
d/0
d/0
b/0
b/0
c/1
c/1
a/0
a/0 a/3
a/3 22 c/0
c/0
00 11
b/1
b/1 e/0
e/0 e/0
e/0
d/6
d/6 33 66 7/6
7/6

b:! 2 c:! a:DB
b:B c:C a:DB
d:CDB e:C f: !
a:ABCDB 1 d:D 3 e:D 6 f:BC 7
Illustration
a:A 1 3 6 7
0 c:!
0 b:CCDDB c:D
b:C b:!
4 b:C 5
4 5
2
b:B c:C a:DB
Fig. 6. Transducer T2 obtained by quasi-determinization from T .
Fig. 5. Sequential transducer T .
d:D e:D f:BC
a:A 1 3 6 7
This ST is0 not minimal
The application considered as
of quasi-determinization
c:D
an automaton.
leads to theThe application
transducer of the
T2 (figure
b:C
automata
(6)) which computes4 theleads
minimization sameto5function.
b:C the reduced transducer
Only represented
the output in figure
labels differ from
(7) which
those of T is
. the minimal ST as previously defined.
2 Fig. 5. Sequential transducer T .
Transducers are often usedb:!in both directions,
c:! a:DB from inputs to outputs and vice
The application
versa. of quasi-determinization
The first stage of quasi-determinization
d:CDB
leadsin to the transducer
e:C the minimization
f: !
Talgorithm
2 (figure
a:ABCDB 1 3 6 7
(6))also
has whichan computes
interesting the same
effect on function. Only
the reverse the output
application labels
of the differtrans-
minimal from
0 c:!
those ofIndeed,
ducer. T . b:CCDDB
since it reduces
b:! the ambiguities, the cost of matching a string
4 5
with the output strings of the transducer
2
is reduced.
b:! c:! a:DB
a:DB
Fig. 6. Transducer T2 b:!
obtained
d:CDB
2 by c:!
quasi-determinization
e:C f: !
from T .
a:ABCDB 1
a:ABCDB 3 e:C 6 f: ! 7
0 1 3 5 6
d:CDBc:!
This ST is0 not minimal considered as an automaton. The application of the
b:CCDDB
b:CCDDB
b:!
automata minimization4 leads to5 the reduced transducer represented in figure
(7) which is the minimal
Fig. 7.ST as previously
Minimal sequentialdefined.
transducer T . 3
Fig. 6. Transducer T2 obtained by quasi-determinization
Mehryar Mohri - Speech Recognition page 39 from .
TInstitute,
Courant NYU
References
• Cyril Allauzen and Mehryar Mohri. Efficient Algorithms for Testing the Twins Property.
Journal of Automata, Languages and Combinatorics, 8(2):117-144, 2003.
• John E. Hopcroft. An n log n algorithm for minimizing the states in a finite automaton. In
The Theory of Machines and Computations, pages 189-196. Academic Press, 1971.
• Mehryar Mohri. Finite-State Transducers in Language and Speech Processing. Computational

Linguistics, 23:2, 1997.
• Mehryar Mohri. Minimization Algorithms for Sequential Transducers. Theoretical

Computer Science, 234:177-201, March 2000.
• Mehryar Mohri. Semiring Frameworks and Algorithms for Shortest-Distance Problems.

Journal of Automata, Languages and Combinatorics, 7(3):321-350, 2002.
• Mehryar Mohri. Generic Epsilon-Removal and Input Epsilon-Normalization Algorithms for

Weighted Transducers. International Journal of Foundations of Computer Science, 13(1):
129-143, 2002.
• Mehryar Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied

Combinatorics on Words. Cambridge University Press, 2005.
References
• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Weighted Automata in Text and
Speech Processing. In Proceedings of the 12th biennial European Conference on Artificial
Intelligence (ECAI-96),Workshop on Extended finite state models of language. Budapest,
Hungary, 1996. John Wiley and Sons, Chichester.
• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. The Design Principles of a
Weighted Finite-State Transducer Library. Theoretical Computer Science, 231:17-32, January
2000.
• Mehryar Mohri and Michael Riley. A Weight Pushing Algorithm for Large Vocabulary
Speech Recognition. In Proceedings of the 7th European Conference on Speech
Communication and Technology (Eurospeech'01). Aalborg, Denmark, September 2001.
• Fernando Pereira and Michael Riley. Finite State Language Processing, chapter Speech
Recognition by Composition of Weighted Finite Automata. The MIT Press, 1997.
• Dominique Revuz. Minimisation of Acyclic Deterministic Automata in Linear Time.

Theoretical Computer Science 92(1): 181-189, 1992.

Lecture 3 PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 3 PDF

Uploaded by

Copyright:

Available Formats

Speech Recognition

Lecture 3: Automata and Transducer Algorithms

Mehryar Mohri - Speech Recognition page 2 Courant Institute, NYU

• Time complexity: Ω(|Q| (T⊕ + T⊗ + T!)).

• Space complexity: Ω(|Q| ) with an in-place

Mehryar Mohri - Speech Recognition page 5 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 6 Courant Institute, NYU

Figure 7: Generic all-pairs shortest-distance algorithm.

Since K is closed for G, dk is well-defined and in view ofpageequations

• depends on queue discipline used. !

• coincides with that of Dijkstra and Bellman-Ford

Mehryar Mohri - Speech Recognition page 11 Courant Institute, NYU

• Epsilon-free case: matching transitions.

Mehryar Mohri - Speech Recognition page 13 Courant Institute, NYU

b:a/.08 (3, 3)/.42

Mehryar Mohri - Speech Recognition page 14 Courant Institute, NYU

!:!1 !:!1 !:!1 !:!1 !:!1 !2:! !2:! !2:! !2:!

T!1 0 a:a 1 b:!2 2 c:!2 3 d:d 4 0 a:d 1 !1: e 2 d:a 3 T!2

An automaton representing the complement can be

Mehryar Mohri - Speech Recognition page 18 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 19 Courant Institute, NYU

• closed semirings: generalization of Floyd-

• Acyclic T! : O(|Q|2 + |Q||E|(T⊕ + T⊗ )).

• General case, tropical semiring:

Mehryar Mohri - Speech Recognition page 23 Courant Institute, NYU

• generalization of subset constructions to

Mehryar Mohri - Speech Recognition

a:eps/0.1 {(1, c, 0.4),

{(4, eps, 0)}

Mehryar Mohri - Speech Recognition page 26 Courant Institute, NYU

y:b (5,b) !:b (.,!)

x canMehryar Mohri - Speech Recognition

• unambiguous automata: O(|Q| + |E| ) . 2 2

• unweighted transducers: O(|Q| (|Q| + |E| )).

Mehryar Mohri - Speech Recognition page 29 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 30 Courant Institute, NYU

• Reweighting: w[e] ← (d[p[e]])−1 (w[e] ⊗ d[n[e]]) for

• acyclic case: O(|Q| + |E|(T + T )). ⊕ ⊗

• general case tropical semiring: O(|Q| log |Q| + |E|).

• general case k-closed semirings !

general case closed semirings Ω(|Q| 3

Mehryar Mohri - Speech Recognition page 33 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 34 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 35 Courant Institute, NYU

Definition: given deterministic weighted

• apply pushing to create canonical representation.

Mehryar Mohri - Speech Recognition page 36 Courant Institute, NYU

Automata: pushing and automata minimization,

• acyclic case: O(|Q| + |E|(T + T )). ⊕ ⊗

• general case tropical semiring: O(|E| log |Q|).

• acyclic case: O(S + |Q| + |E| (|P max | + 1)).

• general case tropical semiring:

Mehryar Mohri - Speech Recognition page 37 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 38 Courant Institute, NYU

• Mehryar Mohri. Finite-State Transducers in Language and Speech Processing. Computational

• Mehryar Mohri. Minimization Algorithms for Sequential Transducers. Theoretical

• Mehryar Mohri. Semiring Frameworks and Algorithms for Shortest-Distance Problems.

• Mehryar Mohri. Generic Epsilon-Removal and Input Epsilon-Normalization Algorithms for

• Mehryar Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied

• Dominique Revuz. Minimisation of Acyclic Deterministic Automata in Linear Time.

Mehryar Mohri - Speech Recognition page 41 Courant Institute, NYU

You might also like