Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

Speech Recognition

Lecture 2: Weighted Finite-StateTransducer


Software Library

Mehryar Mohri
Courant Institute of Mathematical Sciences
mohri@cims.nyu.edu

1
Software Libraries
FSM Library: Finite-State Machine Library. General
software utilities for building, combining,
optimizing, and searching weighted automata and
transducers (MM, Pereira, and Riley, 2000).
http://www.research.att.com/projects/mohri/fsm
OpenFst Library: Open-source Finite-state
transducer Library (Allauzen et al., 2007).
http://www.openfst.org

Mehryar Mohri - Speech Recognition page Courant2 Institute, NYU


2
Software Libraries
GRM Library: Grammar Library. General software
collection for constructing and modifying weighted
automata and transducers representing grammars
and statistical language models (Allauzen, MM, and
Roark, 2005).

http://www.research.att.com/projects/mohri/grm
DCD Library: Decoder Library. General software
collection for speech recognition decoding and
related functions (MM and Riley, 2003).
http://www.research.att.com/~fsmtools/dcd
Mehryar Mohri - Speech Recognition page Courant3 Institute, NYU
3
FSM Library
The FSM utilities construct, combine, minimize, and
search weighted finite-states transducers.
• User Program Level: Programs that read from
and write to files or pipelines, fsm(1):
fsmintersect in1.fsm in2.fsm >out.fsm
• C(++) Library Level: Library archive of C(++)
functions that implements the user program
level, fsm(3):
Fsm in1 = FSMLoad("in1.fsm");
Fsm in2 = FSMLoad("in2.fsm");
Fsm out = FSMIntersect(fsm1, fsm2);
FSMDump("out.fsm", out);

Mehryar Mohri - Speech Recognition page Courant4 Institute, NYU


4
• Definition Level: Specification of labels, of costs,
and of types of FSM representations.

Mehryar Mohri - Speech Recognition page Courant5Institute, NYU


5
This Lecture
Weighted automata and transducers
Rational operations
Elementary unary operations
Fundamental binary operations
Optimization algorithms
Search algorithms

Mehryar Mohri - Speech Recognition page 6 Courant Institute, NYU


6
FSM File Types
Textual format

• automata/acceptor files,
• transducer files,
• symbols files.
Binary format: compiled representation used by
all FSM utilities.

Mehryar Mohri - Speech Recognition page Courant7 Institute, NYU


7
Compiling, Printing, and Drawing
Compiling
• fsmcompile -s tropical -iA.syms <A.txt >A.fsm

• fsmcompile -s log -iA.syms -oA.syms -t <T.txt >T.fsm

Printing
• fsmprint -iA.syms <A.fsm >A.txt

• fsmprint -iA.syms -oA.syms <T.fsm | dot -Tps >T.ps

Drawing
• fsmdraw -iA.syms <A.fsm | dot -Tps >A.ps

• fsmdraw -iA.syms -oA.syms <T.fsm | dot -Tps >T.ps


Mehryar Mohri - Speech Recognition page Courant8 Institute, NYU
8
Weight Sets: Semirings
A semiring (K, ⊕, ⊗, 0, 1) is a ring that may lack
negation.

• sum: to compute the weight of a sequence


(sum of the weights of the paths labeled with
that sequence).

• product: to compute the weight of a path


(product of the weights of constituent
transitions).

Mehryar Mohri - Speech Recognition page Courant9 Institute, NYU


9
Semirings - Examples

Semiring Set ⊕ ⊗ 0 1
Boolean {0, 1} ∨ ∧ 0 1
Probability R+ + × 0 1
Log R ∪ {−∞, +∞} ⊕log + +∞ 0
Tropical R ∪ {−∞, +∞} min + +∞ 0

with ⊕log defined by: x ⊕log y = − log(e−x + e−y ).

Mehryar Mohri - Speech Recognition page Courant10Institute, NYU


10
Automata/Acceptors
Graphical Representation ( A.ps )
red/0.5

green/0.3 blue/0
0 1 2 /0.8
yellow/0.6

Acceptor file (A.txt)


0 0 red .5
0 1 green .3
1 2 blue
1 2 yellow .6
2 .8

Symbols file ( A.syms )


red 1
green 2
blue 3
yellow 4
Mehryar Mohri - Speech Recognition page Courant11Institute, NYU
11
Transducers
Graphical Representation ( T.ps )
red:yellow/0.5

green:blue/0.3 blue:green/0
0 1 2 /0.8
yellow:red/0.6

Transducer file ( T.txt )


0 0 red yellow .5
0 1 green blue .3
1 2 blue green
1 2 yellow red .6
2 .8

Symbols file ( T.syms )


red 1
green 2
blue 3
yellow 4
Mehryar Mohri - Speech Recognition page Courant
12Institute, NYU
12
Paths - Definitions and Notation
Path π input label output label

i[π]:o[π]
p[π] n[π]
previous state next state or
or source state destination state

Sets of paths
• P (R1 , R2 ) : paths from R1 ⊆ Q to R2 ⊆ Q .
• P (R1 , x, R2 ) : paths in P (R1 , R2 ) with input label x.
• P (R1 , x, y, R2 ) : paths in P (R1 , x, R2 ) with output
label y.
Mehryar Mohri - Speech Recognition page Courant
13Institute, NYU
13
General Definitions
Alphabets: input Σ , output ∆.
States: Q, initial states I , final states F.
Transitions: E ⊆ Q × (Σ ∪ {!}) × (∆ ∪ {!}) × K × Q.
Weight functions:

• initial: λ : I → K .
• final: ρ : F → K .

Mehryar Mohri - Speech Recognition page Courant


14Institute, NYU
14
Automata and Transducers - Definitions
Automaton A = (Σ, Q, I, F, E, λ, ρ)
∀x ∈ Σ∗ ,
!
[[A]](x) = λ(p[π]) ⊗ w[π] ⊗ ρ(n[π])
π∈P (I,x,F )

Transducer T = (Σ, ∆, Q, I, F, E, λ, ρ)
∀x ∈ Σ∗ , y ∈ ∆∗ ,
!
[[T ]](x, y) = λ(p[π]) ⊗ w[π] ⊗ ρ(n[π])
π∈P (I,x,y,F )

Mehryar Mohri - Speech Recognition page Courant


15Institute, NYU
15
Weighted Automata
b/0.6
b/0.2
a/0.1 1 a/0.4 b/0.3
2 3/0.1
0 a/0.5

[[A]](x) =
Sum of the weights of all successful
paths labeled with x

[[A]](abb) = .1 × .2 × .3 × .1 + .5 × .3 × .6 × .1

Mehryar Mohri - Speech Recognition page Courant


16Institute, NYU
16
Weighted Transducers
b:a/0.6
b:a/0.2
a:b/0.1 1 a:a/0.4 b:a/0.3
2 3/0.1
0 a:b/0.5

[[T ]](x, y) = Sum of the weights of all successful


paths with input x and output y.

[[T ]](abb, baa) = .1 × .2 × .3 × .1 + .5 × .3 × .6 × .1

Mehryar Mohri - Speech Recognition page Courant


17Institute, NYU
17
This Lecture
Weighted automata and transducers
Rational operations
Elementary unary operations
Fundamental binary operations
Optimization algorithms
Search algorithms

Mehryar Mohri - Speech Recognition page 18 Courant Institute, NYU


18
Rational Operations
Sum
[[T1 ⊕ T2 ]](x, y) = [[T1 ]](x, y) ⊕ [[T2 ]](x, y)

Product
!
[[T1 ⊗ T2 ]](x, y) = [[T1 ]](x1 , y1 ) ⊗ [[T2 ]](x2 , y2 ).
x=x1 x2
Closure y=y1 y2

!

[[T ∗ ]](x, y) = [[T ]]n (x, y)
n=0

Mehryar Mohri - Speech Recognition page Courant


19Institute, NYU
19
• Conditions (on the closure operation): condition
on T: e.g., weight of ε-cycles = 0 (regulated
transducers), or semiring condition: e.g.,1 ⊕ x = 1
as with the tropical semiring (more generally
locally closed semirings).

• Complexity and implementation:


• linear-time complexity:
O((|E1 | + |Q1 |) + (|E2 | + |Q2 |)) or
O(|Q| + |E|)

• lazy implementation.
Mehryar Mohri - Speech Recognition page Courant20Institute, NYU
20
Sum - Illustration
Program: fsmunion A.fsm B.fsm >C.fsm
red/0.5 green/0.4 1 /0

Graphical
0
green/0.3 representation:
1
blue/0
2 /0.8
0 blue/1.2
yellow/0.6 2 /0.3
red/0.5 green/0.4 1 /0
red/0.5 green/0.4 1 /0

green/0.3 blue/0 0 blue/1.2


0 green/0.3 1 A.fsa blue/0 2 /0.8 0 blue/1.2
0 1 2 /0.8 B.fsa
yellow/0.6
yellow/0.6 2 /0.3
2 /0.3

red/0.5

A.fsa A.fsa green/0.3 blue/0 B.fsaB.fsa


eps/0 0 1 2 /0.8
yellow/0.6
6
eps/0
green/0.4
3
red/0.5red/0.5 4 /0
blue/1.2
green/0.3
green/0.3 blue/0
blue/0
0 1 1 2 /0.8
2 /0.8
eps/0 eps/0 0 yellow/0.6
yellow/0.6
6 6 5 /0.3
eps/0 eps/0
green/0.4
3 3 green/0.4 4 /0 4 /0
blue/1.2
blue/1.2
Mehryar Mohri - Speech Recognition page Courant
21Institute, NYU
C.fsa 21
Product - Illustration
Program: fsmconcat A.fsm B.fsm >C.fsm
red/0.5 Graphical representation: green/0.4 1 /0

0 blue/1.2 1 /0
red/0.5 green/0.3 blue/0 green/0.4
0 red/0.5 1 2 /0.8 green/0.4 1 /0
yellow/0.6 2 /0.3
green/0.3 blue/0 0 blue/1.2
0 green/0.3 1 blue/0 2 /0.8 0 blue/1.2
0 1 2 /0.8
yellow/0.6
yellow/0.6 2 /0.3
2 /0.3
A.fsa B.fsa

A.fsa A.fsa B.fsaB.fsa


red/0.5 4 /0
green/0.4
green/0.3 blue/0 eps/0.8
0 1 2 3 blue/1.2
red/0.5red/0.5
yellow/0.6
green/0.3
green/0.3 blue/0
blue/0 5 /0.3
0 1 1 2 /0.8
2 /0.8
eps/0 eps/0 0 yellow/0.6
yellow/0.6
6 6
eps/0 eps/0
green/0.4
3 3 green/0.4 4 /0 4 /0
blue/1.2
blue/1.2 C.fsa
Mehryar Mohri - Speech Recognition page Courant
22Institute, NYU
22
Closure - Illustration
Program: fsmclosure B.fsm >C.fsm
Graphical representation:
green/0.4
green/0.4 1 /0
1 /0
1 /0
green/0.4 eps/0
eps/0
0 0 blue/1.2 3 /0 0
8 blue/1.2 blue/1.2

2 /0.3
2 /0.3 eps/0.3 2 /0.3

B.fsa

B.fsa C.fsa

blue/0
Mehryar Mohri - Speech Recognition page Courant
2 /0.8 23Institute, NYU
yellow/0.6 23
This Lecture
Weighted automata and transducers
Rational operations
Elementary unary operations
Fundamental binary operations
Optimization algorithms
Search algorithms

Mehryar Mohri - Speech Recognition page 24 Courant Institute, NYU


24
Elementary Unary Operations
Reversal
[[T!]](x, y) = [[T ]](!
x, y!)

Inversion
[[T −1
]](x, y) = [[T ]](y, x)

Projection
!
[[A]](x) = [[T ]](x, y)
y
Linear-time complexity, lazy implementation (not
for reversal).
Mehryar Mohri - Speech Recognition page Courant
25Institute, NYU
25
Reversal - Illustration
Program: fsmreverse A.fsm >C.fsm
Graphical representation:
red/0.5
green/1.2 3 /0

red/0.5 green/0.3 blue/0


0 1 2
yellow/0.6 green/1.2 blue/2 3 /0
green/0.3 blue/0 4 /0.3
0 1 2 blue/2
yellow/0.6
4 /0.3
A.fsa

red/0.5
eps/0 4 green/1.2A.fsa
blue/0 green/0.3
0 3 2 red/0.5 1 /0
eps/0.3 blue/2
yellow/0.6
eps/0 4 green/1.2
5
blue/0 green/0.3
0 3 2 1 /0
eps/0.3 blue/2
yellow/0.6
5 C.fsa
Mehryar Mohri - Speech Recognition page Courant
26Institute, NYU
26
Inversion - Illustration
Program: fsminvert A.fsm >C.fsm
red:bird/0.5
Graphical representation:
green:pig/0.3 blue:cat/0
0
red:bird/0.5 1 2 /0.8
yellow:dog/0.6
green:pig/0.3 blue:cat/0
0 1 2 /0.8
yellow:dog/0.6
A.fst
bird:red/0.5
A.fst
pig:green/0.3 cat:blue/0
0
bird:red/0.5 1 2 /0.8
dog:yellow/0.6
pig:green/0.3 cat:blue/0
0 1 2 /0.8
dog:yellow/0.6
C.fst
Mehryar Mohri - Speech Recognition page Courant
27Institute, NYU
27
Projection - Illustration
Program: fsmproject -1 T.fsm >A.fsm
red:bird/0.5
Graphical representation:
green:pig/0.3 blue:cat/0
0 1 2 /0.8
red:bird/0.5
yellow:dog/0.6
green:pig/0.3 blue:cat/0
0 1 2 /0.8
yellow:dog/0.6
T.fst
red/0.5
A.fst
bird:red/0.5green/0.3 blue/0
0 1 2 /0.8
yellow/0.6
pig:green/0.3 cat:blue/0
0 1 2 /0.8
dog:yellow/0.6

Mehryar Mohri - Speech Recognition A.fsa page Courant


28Institute, NYU
28
This Lecture
Weighted automata and transducers
Rational operations
Elementary unary operations
Fundamental binary operations
Optimization algorithms
Search algorithms

Mehryar Mohri - Speech Recognition page 29 Courant Institute, NYU


29
Some Fundamental Binary Operations
(Pereira and Riley, 1997; MM et al. 1996)

Composition ((K, ⊕, ⊗, 0, 1) commutative)


!
[[T1 ◦ T2 ]](x, y) = [[T1 ]](x, z) ⊗ [[T2 ]](z, y)
z
Intersection ( (K, ⊕, ⊗, 0, 1) commutative)

[[A1 ∩ A2 ]](x) = [[A1 ]](x) ⊗ [[A2 ]](x)

Difference (A2 unweighted and deterministic)

[[A1 − A2 ]](x) = [[A1 ∩ A2 ]](x)


Mehryar Mohri - Speech Recognition page Courant
30Institute, NYU
30
• Complexity and implementation:
• quadratic complexity:
O((|E1 | + |Q1 |) (|E2 | + |Q2 |))
• path multiplicity in presence of ε-transitions: ε-
filter;

• lazy implementation.

Mehryar Mohri - Speech Recognition page Courant31Institute, NYU


31
Composition - Illustration
Program: fsmcompose A.fsm B.fsm >C.fsm
Graphicalc:a/0.3
representation: a
c:a/0.3
c:a/0.3
a:b/0.6
a:b/0.6
a:b/0.1 1 a:a/0.4
a:b/0.1
a:b/0.1 1 1 a:a/0.4
a:a/0.4 b:c/0.3 a:b/0.4
0 b:b/0.5 3/0.6 0 1
b:a/0.2 b:c/0.3
b:c/0.3 a:b/0.4
a:b/0.4
0 0 3/0.6
3/0.6 0 0 1 1 2/0.72/0.7
b:a/0.2
b:a/0.2 2 b:b/0.5
b:b/0.5
2 2

c:b/0.9
c:b/0.9
c:b/0.9

c:b/0.7 (1, 2) (1, 2) a:b/1


c:b/0.7
c:b/0.7 (1, 2) a:b/1
a:b/1
a:c/0.4
a:c/0.4
(0,
(0,(0, 0)
0) 0) a:c/0.4 1)(1,
(1, (1, 1) (3, 2)/1.3 (3, 2)/1.3
1) a:b/0.8 (3, 2)/1.3
a:b/0.8
a:b/0.8

Mehryar Mohri - Speech Recognition page Courant


32Institute, NYU
32
Multiplicity and ε-Transitions - Problem
book on Speech Processing and Speech Communication
Springer Handbook on Speech Processing and Speech Communication
14

!:e
1,1 (!1:!1)a:d1,2 !:e 1,2 a:a b:! c:! d:d
0,0 (x:x) 1,1 0 1 2 3 4
:! b:e b:!
(!1:!1) a:a b:! c:!
0 1 2
!2) b:!
(!2:!1) (!2:!2) b:e b:!
(!2:!2) (!2:!1) (!2:!2)
!:e
2,1 2,2 !:e
(!1:!1) 2,2
2,1 (!1:!1) 0
a:d
1
!:e
2
d:a
3
! c:! 0
a:d
1
!:e
2
d:a
2) c:!
(!2:!2) c:!
(!2:!2) (!2:!2)
3,1 !:e 3,2
(!1:!1) !:e
3,1d:a (!1:!1) 3,2
(x:x) d:a
(x:x)
4,3
4,3
ndant !-paths. A straightforward generalization of the !-free case could generate all the paths
3, 2)Figure
when 8: Redundant
composing the!-paths. A straightforward
two simple transducers ongeneralization
the right-handofside.
the !-free case could gen
from (1, 1) to (3, 2) when composing the two simple transducers on the right-hand side.

Mehryar Mohri - Speech Recognition


⊗x. Thus, x can be viewed as the resid- nal state and its final weightpageis obtained
Courant
33Institute,
by summingNYU

λ[p[π]]⊗w[π]⊗x. Thus, x can be viewed as the resid- nal state and its final weight ! is obta
33
Solution - Filter F for Composition
!1:!1

!2:!1 !1:!1
x:x 1
x:x

0 !2:!2
!2:!2

x:x 2

Replace T1 ◦ T2 with T̃1 ◦ F ◦ T̃2 .

Mehryar Mohri - Speech Recognition page Courant


34Institute, NYU
34
Intersection - Illustration
Program: fsmintersect A.fsm B.fsm >C.fsm
/0.5 green/0.4

0
green/0.3 Graphical
1
representation:
blue/0
2 /0.8 red/0.2 yellow/1.3
0 /0 1 2 /0.5
yellow/0.6 blue/0.6
red/0.5
red/0.5 green/0.4
green/0.4
green/0.3
green/0.3 blue/0
blue/0 red/0.2 yellow/1.3
00 11 22 /0.8
/0.8
00 /0
/0 red/0.2
11
yellow/1.3
22 /0.5
/0.5
yellow/0.6
yellow/0.6 blue/0.6
A.fsa blue/0.6 B.fsa

A.fsa
A.fsa B.fsa
B.fsa
blue/0.6 3 /0.8

red/0.7 green/0.7
0 1 2 blue/0.6
yellow/1.9
blue/0.6 33 /0.8
/0.8

red/0.7
red/0.7 green/0.7
green/0.7
00 11 22 yellow/1.9
yellow/1.9 4 /1.3

44 /1.3
/1.3

C.fsa
Mehryar Mohri - Speech Recognition C.fsa
C.fsa page Courant
35Institute, NYU
35
Difference - Illustration
Program: fsmdifference A.fsm B.fsm >C.fsm
Graphical representation:
green
red/0.5
red/0.5 green
red/0.5 red green/0.4 yellow
green/0.3 blue/0 0 1 2
0 1 2 /0.8
green/0.3 blue/0
yellow/0.6 blue
red
red/0.2 yellow
yellow/1.3
0 0 green/0.3 1 1 blue/0 2 /0.8
2 /0.8 0 /0 0 11 22 /0.5
yellow/0.6
yellow/0.6 blue
blue/0.6

A.fsa B.fsa
A.fsa B.fsa
B.fsa
A.fsa red/0.5
red/0.5
red/0.5
red/0.5 1 2 green/0.3
3 /0.8
red/0.5
green/0.3
blue/0.6 blue/0
0 red/0.5 1 2 green/0.3 3 4 /0.8
red/0.7 green/0.7 yellow/0.6
blue/0
0 0 green/0.31 2 3
yellow/1.9 4 /0.8
yellow/0.6
4 /1.3

C.fsa
C.fsa
Mehryar Mohri - Speech Recognition C.fsa page Courant
36Institute, NYU
36
This Lecture
Weighted automata and transducers
Rational operations
Elementary unary operations
Fundamental binary operations
Optimization algorithms
Search algorithms

Mehryar Mohri - Speech Recognition page 37 Courant Institute, NYU


37
Optimization Algorithms
Connection: removes non-accessible/non-
coaccessible states.
ε-Removal: removes ε-transitions.
Determinization: creates equivalent deterministic
machine.
Pushing: creates equivalent pushed/stochastic
machine.
Minimization: creates equivalent minimal
deterministic machine.
Mehryar Mohri - Speech Recognition page Courant
38Institute, NYU
38
• Conditions: there are specific semiring conditions
for the use of these algorithms, e.g., not all
weighted automata or transducers can be
determinized using the determinization algorithm.

Mehryar Mohri - Speech Recognition page Courant39Institute, NYU


39
Connection - Illustration
Connection - Illustration
• Program: fsmconnect A.fsm >C.fsm
Program: fsmconnect A.fsm >C.fsm
• Graphical representation:
Graphical representation: green/0.2
3 4 /0.2

green/0.2
3
red/0.5 4 /0.2 blue/0 2 /0.8
green/0.3 yellow/0.6
0
red/0.5 1 blue/0
red/0 2 /0.8
green/0.3 yellow/0.6
0 1 5
red/0

A.fsa 5

red/0.5
A.fsa
green/0.3 blue/0
0 1 2 /0.8
yellow/0.6

C.fsa
Corinna
Mehryar Mohri - Speech Recognition Cortes and Mehryar Mohri - WFSTs in Computational Biology
page Courant
40Institute, NYU
40
Connection - Algorithm
Definition:
• Input: weighted transducer T1 .
• Output: equivalent weighted transducer T2 with
all states connected.
Description:
3. Depth-first search of T1 from I1 .
4. Mark accessible and coaccessible states.
5. Keep marked states and corresponding
transitions.
Complexity: linear O(|Q1 | + |E1 |).
Mehryar Mohri - Speech Recognition page Courant
41Institute, NYU
41
ε-Removal - Illustration
Program: fsmrmepsilon T.fsm >TP.fsm
Graphical representation:
!:!/0.3
4/0 1.6
b:b/0.5 1 0.6
!:!/0.3 2 b:a/0.7 0.2
!:!/0.3 !:!/1 4/0 1.6
a:b/0.1 1 b:b/0.5 4/0 1 1.6 1
0 !:!/0.6 b:b/0.5 2 1 0.8 0.6 3 2
2 b:a/0.7 4/0 0.2 0.6
!:!/0.2
a:b/0.1 !:!/1 a:a/0.8 b:a/0.7 0 0.2
0 a:b/0.1 1 !:!/0.6 !:!/1 1
0 1 !:!/0.6
b:a/0.9 0.8 0.3 3 1 2
!:!/0.2 b:a/0.4 3 a:a/0.8 4/0 0.8 3 2
!:!/0.2 a:a/0.8 4/0 0
0 0.3
b:a/0.4 b:a/0.9 3 0.3
b:a/0.4 b:a/0.9 3

a:a/1.6
a:a/1.6
a:a/1.6
b:a/1
b:a/1
b:a/1
b:a/1.7 b:a/1.5
b:a/2.3
0 b:a/1.5
a:b/0.1
b:a/1.7 b:a/1.5
b:a/1.7 a:a/1.4
b:a/2.3
1 b:a/2.3 4/0
00 a:b/0.1
a:b/0.1 b:b/0.7 b:b/0.5 a:a/1.4
11 a:a/1.4 b:a/0.7 4/04/0
b:b/0.7
b:b/0.7 b:b/0.5
b:b/0.5
2 b:a/0.7
b:a/0.7
b:a/0.4
b:a/0.9 22
b:a/0.4 a:a/0.8
b:a/0.4 b:a/0.9
b:a/0.9
a:a/0.8
a:a/0.8
b:a/1.7
3
b:a/1.7
b:a/1.7
33

Mehryar Mohri - Speech Recognition page Courant


42Institute, NYU
42
ε-Removal - Algorithm
(MM, 2001)
Definition:
• Input: weighted transducer T1 .
• Output: equivalent WFST T2 with no ε-transition.
Description:
• Computation of ε-closures.
• Removal of εs.
• Complexity:
• Acyclic T! : O(|Q|2 + |Q||E|(T⊕ + T⊗ )).
• General case (tropical semiring):
2
O(|Q||E| + |Q| log |Q|)
Mehryar Mohri - Speech Recognition page Courant
43Institute, NYU
43
Computation of ε-closures
Definition: for p in Q,
! "
C[p] = (q, w) : q ∈ ![p], d[p, q] = w "= 0 ,
!
where d[p, q] = w[π].
π∈P (p,",q)

Problem formulation: all-pairs shortest-distance


problem in T! (T reduced to its ε-transitions).
• closed semirings: generalization of Floyd-
Warshall algorithm.
• k-closed semirings: generic sparse shortest-
distance algorithm.
Mehryar Mohri - Speech Recognition page Courant
44Institute, NYU
44
Determinization - Algorithm
(MM,1997)
Definition:
• Input: weighted automaton or transducer T1
• Output: equivalent subsequential or deterministic
machine T2: has a unique initial state and no two
transitions leaving the same state share the same
input label.
Description:
3. Generalization of subset construction: weighted
subsets{(q1 , w1 ), . . . , (qn , wn )} , where wis are
remainder weights.
4. Computation of the weight of resulting transitions.
Mehryar Mohri - Speech Recognition page Courant
45Institute, NYU
45
Determinization - Conditions
Semiring: weakly left divisible semirings.
Definition: T is determinizable when the
determinization algorithm applies to T.

• All unweighted automata are determinizable.


• All acyclic machines are determinizable.
• Not all weighted automata or transducers are
determinizable.
• Characterization based on the twins property.
Complexity: exponential, but lazy implementation.
Mehryar Mohri - Speech Recognition page Courant
46Institute, NYU
46
Determinization of Weighted Automata -
Illustration
Program: fsmdeterminize A.fsm >D.fsm
Graphical representation:
a/1
2
b/1
b/1
a/1 b/3
b/4 2
b/3
b/4
0 3/0
0 a/3 b/3 3/0
a/3 b/3
b/1 b/5
1/0
b/1 b/5
1/0

{(1,2),(2,0)}/2
a/1 {(1,2),(2,0)}/2 b/1
a/1 b/1
{(0,0)} {(3,0)}/0
{(0,0)} b/1 b/3 {(3,0)}/0
b/1 b/3

{(1,0),(2,3)}/0
{(1,0),(2,3)}/0

Mehryar Mohri - Speech Recognition page Courant


47Institute, NYU
47
Determinization of Weighted Transducers -
Illustration
Program: fsmdeterminize T.fsm >D.fsm
Graphical representation: a:b/0.6
Determinization of Weighted Transducers
b:eps/0.4 – Illustration
a:a/0.1 2 b:b/0.3
• Program: fsmdeterminize
a:b/0.2 T.fsm >D.fsm a:eps/0.5
0 3 4/0.7
a:c/0.5
• Graphical Representation:
c:eps/0.7
1

{(0, b, 0)}
a:a/0.2

a:b/0.1

a:eps/0.1 {(1, c, 0.4),


{(0, eps, 0)} b:a/0.3 a:b/0.6
(2, a, 0)}
{(3, b, 0),
(4, eps, .1)}
/(eps, 0.8) a:b/0.5

{(4, eps, 0)}


/(eps, 0.7)
c:c/1.1

Mehryar Mohri - Speech Recognition page Courant


48Institute, NYU
48
Pushing - Algorithm
(MM,1997; 2004)
Definition:
• Input: weighted automaton or transducer T1
• Output: equivalent automaton or transducer T2
such that the longest common prefix of all
outgoing paths be ε or such that the sum of the
weights of all outgoing transitions be 1 modulo
the string or weight at the initial state.

Mehryar Mohri - Speech Recognition page Courant


49Institute, NYU
49
• Description:
1. Single-source shortest distance computation:
for each state q,
!
d[q] = w[π].
π∈P (q,F )

2. Reweighting: for each transition e such that


d[p[e]] != 0,

w[e] ← (d[p[e]])−1 (w[e] ⊗ d[n[e]])

Mehryar Mohri - Speech Recognition page Courant50Institute, NYU


50
• Conditions (automata case): weakly divisible
semiring, zero-sum free semiring or automaton.
• Complexity:
• automata case
•acyclic case: linear O(|Q| + |E|(T⊕ + T⊗ )).
•general case (tropical semiring):
O(|Q| log |Q| + |E|).
• transducer case:
O((|Pmax | + 1) |E|).

Mehryar Mohri - Speech Recognition page Courant51Institute, NYU


51
Weight Pushing - Illustration

Program: fsmpush -ic A.fsm >P.fsm


Graphical representation:

• Tropical semiring:
a/0
a/0 a/0
a/0
e/0
e/0 e/0
e/0
b/1
b/1 11 b/1
b/1 11
f/1
f/1 f/1
f/1
c/4
c/4 c/4
c/4
00 3/0 00 3/0
3/0
d/0
d/0 e/10
e/10 d/10
d/10 e/0
e/0

e/1
e/1 f/11
f/11 e/11
e/11 f/1
f/1
22 22

Mehryar Mohri - Speech Recognition page Courant


52Institute, NYU
52
• Log semiring:
a/0.3266
a/0.3266
a/0
a/0 e/0.3132
e/0
e/0 b/1.3266
b/1.3266 1 1 e/0.3132
b/1
b/1 11 f/1.3132
f/1
f/1 f/1.3132
c/4
c/4 c/4.3266
c/4.3266
00 3/03/0 0 0 3/-0.6393/-0.63
d/0 e/10
e/10 e/0.3132
e/0.3132
d/10.326
d/10.326

e/1
e/1 f/11
f/11
22
f/1.3132
f/1.3132
e/11.326
e/11.326 2 2

Mehryar Mohri - Speech Recognition page Courant53Institute, NYU


53
Label Pushing - Illustration

Program: fsmpush -il T.fsm >P.fsm


Graphical representation:
a:a
a:a a:d
a:d
b:
b:!! 33 c:d
c:d
c:c:!! a:a:!! a:!!
a: f:d
f:d
00 11 22 d: ! 44 55 66
b:d
b:d

a:a
a:a a:d
a:d
b:!
b: 3 c: !
c:d
c:d a:d
a:d a:
a:!! f:f:!!
00 11 22 d:d 44 55 66
b:b:!!

Mehryar Mohri - Speech Recognition page Courant


54Institute, NYU
54
Minimization - Algorithm (MM,1997)
Definition:
• Input: deterministic weighted automaton or
transducer T1 .
• Output: equivalent deterministic automaton or
transducer T2 with the minimal number of states
and transitions.
Description:
• Canonical representation: use pushing or other
algorithm to standardize input automata.
• Automata minimization: encode pairs (label,
weight) as labels and use classical unweighted
minimization algorithm.
Mehryar Mohri - Speech Recognition page Courant
55Institute, NYU
55
• Complexity:
• Automata case
• acyclic case: linear, O(|Q| + |E|(T⊕ + T⊗)).
• general case (tropical semiring): O(|E| log |Q|).
• Transducer case
• acyclic case: O(S + |Q| + |E| (|Pmax| + 1)).
• general case (tropical semiring):
O(S + |Q| + |E| (log |Q| + |Pmax |)).
Mehryar Mohri - Speech Recognition page Courant56Institute, NYU
56
Minimization - Illustration

Program: fsmminimize D.fsm >M.fsm


Graphical representation:
22
b/1
b/1 c/2
c/2

d/0
d/0 c/1
c/1
d/3
d/3 33 e/2
e/2
a/0
a/0 e/1
e/1
00 11 d/4
d/4 66 7/0
7/0
e/3
e/3
b/1
b/1
55
a/2
a/2 c/3
c/3

44

22
b/0
b/0 c/0
c/0

d/0
d/0 c/1
c/1
d/6
d/6 33 e/0
e/0
a/0
a/0 e/0
e/0
00 11 d/6
d/6 66 7/6
7/6
e/0
e/0
b/1
b/1
55
a/3
a/3 c/0
c/0

44
d/0
d/0
b/0
b/0
c/1
c/1
a/0
a/0 a/3
a/3 22 c/0
c/0
00 11
b/1
b/1 e/0
e/0 e/0
e/0
d/6
d/6 33 66 7/6
7/6

Mehryar Mohri - Speech Recognition page Courant


57Institute, NYU
57
Equivalence - Algorithm
Definition:
• Input: deterministic weighted automata A and B.
• Output: TRUE iff A and B equivalent.
Description (MM,1997):
• Canonical representation: use pushing or other
algorithm to standardize input automata.
• Automata minimization: encode pairs (label,
weight) as labels and use classical algorithm for
testing the equivalence of unweighted automata.
Complexity: (second stage is quasi-linear)
O(|E1 | + |E2 | + |Q1 | log |Q1 | + |Q2 | log |Q2 |).
Mehryar Mohri - Speech Recognition page Courant
58Institute, NYU
58
Equivalence - Illustration

Program: fsmequiv [-v] D.fsm M.fsm


Graphical representation:
blue/0.7 2 /0.3

red/0.3 blue/0.7 2 /0.3


0 1 yellow/0.9
red/0.3
0 1 yellow/0.9
3 /0.4

3 /0.4

D.fsa

D.fsa

red/0 blue/0
0 1 2 /1.3
red/0 yellow/0.3
blue/0
0 1 2 /1.3
yellow/0.3

M.fsa
M.fsa
Mehryar Mohri - Speech Recognition page Courant
59Institute, NYU
59
This Lecture
Weighted automata and transducers
Rational operations
Elementary unary operations
Fundamental binary operations
Optimization algorithms
Search algorithms

Mehryar Mohri - Speech Recognition page 60 Courant Institute, NYU


60
Single-Source Shortest-Distance Algorithms -
Illustration
Program: fsmbestpath [-n N] A.fsm >C.fsm
Graphical representation:
red/0.5
red/0.5
red/0.5
red/0.5 1 red/0.5 2 green/0.3
red/0.5 1 2 green/0.3
green/0.3 blue/0
0 green/0.3 3 blue/0 4 /0.8
0 3 yellow/0.6 4 /0.8
yellow/0.6

A.fsa
A.fsa

green/0.3 blue/0
0 green/0.3 1 blue/0 2 /0.8
0 1 2 /0.8

C.fsa
C.fsa

Mehryar Mohri - Speech Recognition page Courant


61Institute, NYU
61
Pruning - Illustration

Program: fsmprune -c1.0 A.fsm >C.fsm


Graphical representation:
red/0.5

red/0.5
red/0.5 1 red/0.5 2 green/0.3
red/0.5 1 2 green/0.3
green/0.3 blue/0
0 green/0.3 3 blue/0 4 /0.8
0 3 4 /0.8
yellow/0.6
yellow/0.6

A.fsa
A.fsa

green/0.3
green/0.3 blue/0
blue/0
00 1 1 22/0.8
/0.8
yellow/0.6

C.fsa
C.fsa

Mehryar Mohri - Speech Recognition page Courant


62Institute, NYU
62
Summary
FSM Library:

• weighted finite-state transducers (semirings);


• elementary unary operations (e.g., reversal);
• rational operations (sum, product, closure);
• fundamental binary operations (e.g., composition);
• optimization algorithms (e.g., ε-removal,
determinization, minimization);
• search algorithms (e.g., shortest-distance
algorithms, n-best paths algorithms, pruning).
Mehryar Mohri - Speech Recognition page Courant
63Institute, NYU
63
References
• Cyril Allauzen and Mehryar Mohri. Efficient Algorithms for Testing the Twins Property.
Journal of Automata, Languages and Combinatorics, 8(2):117-144, 2003.

• Cyril Allauzen, Mehryar Mohri, and Brian Roark. The Design Principles and Algorithms of a
Weighted Grammar Library. International Journal of Foundations of Computer Science, 16(3):
403-421, 2005.

• Cyril Allauzen, Michael Riley, Johan Schalkwyk, Wojciech Skut, and Mehryar Mohri.
OpenFst: a general and efficient weighted finite-state transducer library. In Proceedings of
the Ninth International Conference on Implementation and Application of Automata, (CIAA
2007), volume 4783 of Lecture Notes in Computer Science, pages 11-23. Springer, Berlin,
2007.

• Mehryar Mohri. Finite-State Transducers in Language and Speech Processing. Computational


Linguistics, 23:2, 1997.

• Mehryar Mohri. Weighted Grammar Tools: the GRM Library. In Robustness in Language and
Speech Technology. pages 165-186. Kluwer Academic Publishers, The Netherlands, 2001.

• Mehryar Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied


Combinatorics on Words. Cambridge University Press, 2005.
Mehryar Mohri - Speech Recognition page 64 Courant Institute, NYU
64
References
• Mehryar Mohri. Generic Epsilon-Removal and Input Epsilon-Normalization Algorithms for
Weighted Transducers. International Journal of Foundations of Computer Science, 13(1):
129-143, 2002.

• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. The Design Principles of a
Weighted Finite-State Transducer Library. Theoretical Computer Science, 231:17-32, January
2000.

• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Weighted Automata in Text and
Speech Processing. In Proceedings of the 12th biennial European Conference on Artificial
Intelligence (ECAI-96),Workshop on Extended finite state models of language. Budapest,
Hungary, 1996. John Wiley and Sons, Chichester.

• Mehryar Mohri and Michael Riley. A Weight Pushing Algorithm for Large Vocabulary
Speech Recognition. In Proceedings of the 7th European Conference on Speech
Communication and Technology (Eurospeech'01). Aalborg, Denmark, September 2001.

• Fernando Pereira and Michael Riley. Finite State Language Processing, chapter Speech
Recognition by Composition of Weighted Finite Automata. The MIT Press, 1997.

Mehryar Mohri - Speech Recognition page 65 Courant Institute, NYU


65

You might also like