Professional Documents
Culture Documents
Deep Learning in Speech Processing: Potentials and Challenges
Deep Learning in Speech Processing: Potentials and Challenges
Dong Yu
Microsoft Research
Why Speech Recognition is Hard
Sequential multi-class problem
Variability
Articulation differences
Environment differences
Variability
Within each frame
Along the whole sequence
Variability
Mapping between different layers
Across different dialog contexts
11/04/2021 2 Dong Yu
Potentials of Deep Learning
Derive robust and discriminative features
Directly from waveforms and/or spectrums
With many layers’ of transformations
Compact with regard to the number of parameters
Sparseness with regard to the number of active features
Distributed with regard to the information storage
Learn and incorporate long-range dependencies
At different levels: semantic, syntactic, pronunciation
Discovered automatically
Learn to know when the context is important and when is
not
11/04/2021 3 Dong Yu
Challenges
Basic theory
Why greedy layer-wise pre-training helps?
Is there better way to pre-train the models?
Basic model
How to integrate generative and discriminative abilities?
How to represent sequential patterns ?
How to discover the linguistic hierarchy?
How to combine the supervised, unsupervised, and lightly-
supervised learning?
Special considerations
Is it robust to mismatched test conditions?
Can we scale the learning process up to > 2000 hours of speech?
11/04/2021 4 Dong Yu