Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 4

Deep Learning in Speech Processing:

Potentials and Challenges

Dong Yu
Microsoft Research
Why Speech Recognition is Hard
 Sequential multi-class problem
 Variability
 Articulation differences
 Environment differences
 Variability
 Within each frame
 Along the whole sequence
 Variability
 Mapping between different layers
 Across different dialog contexts
11/04/2021 2 Dong Yu
Potentials of Deep Learning
 Derive robust and discriminative features
 Directly from waveforms and/or spectrums
 With many layers’ of transformations
 Compact with regard to the number of parameters
 Sparseness with regard to the number of active features
 Distributed with regard to the information storage
 Learn and incorporate long-range dependencies
 At different levels: semantic, syntactic, pronunciation
 Discovered automatically
 Learn to know when the context is important and when is
not
11/04/2021 3 Dong Yu
Challenges
 Basic theory
 Why greedy layer-wise pre-training helps?
 Is there better way to pre-train the models?
 Basic model
 How to integrate generative and discriminative abilities?
 How to represent sequential patterns ?
 How to discover the linguistic hierarchy?
 How to combine the supervised, unsupervised, and lightly-
supervised learning?
 Special considerations
 Is it robust to mismatched test conditions?
 Can we scale the learning process up to > 2000 hours of speech?
11/04/2021 4 Dong Yu

You might also like