Recognizing GSM Digital Speech: Presented by Amarnath Shenoy K. 4so05ec004

RECOGNIZING GSM DIGITAL
SPEECH
Presented by
Amarnath Shenoy K.
4so05ec004
CONTENTS
 GSM
 Speech recognition in the GSM environment
 Remote Recognition
 Recognizing GSM Speech
 Uses
 Conclusion
 Cellular Network
• Macro Cells
• Micro Cells
• Pico Cells
• Femto Cells
• Umbrella Cells
 Speech Coding
• GSM uses Pulse Coded Modulation (64kbps)
• Full-Rate speech codec to remove the
redundancy in the signal and achieve a bit
rate of 13 kbps.
 Channel Coding
• Convolution Coding: Extra bits are added to
the bit stream so that the receiver can
recognize and correct errors due to
transmission.
GSM
 Interleaving
• Aim is to avoid the risk of losing consecutive
data bits.
 Multiple Access
• Allows many users to use their cell phones at
the same time and share the limited
bandwidth.
• FDMA divides the spectrum into small slices
and then TDMA separates each frequency
slices in time into any block.
GSM
 Ciphering
• It is used to encrypt the data so that no one
can overhear the conversation of another
user.
 Modulation
• Modulation used in GSM is Gaussian
minimum-shift keying (GMSK)
SPEECH RECOGNITION IN THE GSM
ENVIRONMENT
 Alternative Architectures for ASR in the GSM

Environment
• Local recognition (client-only processing)
• Remote recognition (server-only processing)
• Distributed recognition (client-server
processing)
LOCAL RECOGNITION
 Performs ASR at the user local terminal.
 Coding distortion and transmission errors is
avoided.
 Down side:
• The ASR application should reside in the user

terminal.
• Increases computational load on local
terminal.
REMOTE RECOGNITION
 Server performs all the recognition process.
 The computational load at the local terminals
is null.
 Mobile terminal does not house the ASR
application.
 Down Side:
• Distortions degrade the performance of the

ASR system.
DISTRIBUTED RECOGNITION
 Compromise between the two previous

solutions.
 Recognition process at the client end
(namely, the parameterization), and the
remaining at the server end.
 Down Side:
• A standardized front-end is needed so that

every client terminal computes and transmits
the same parameters.
REMOTE RECOGNIZING IS MORE
CONVENIENT
 Does not impose restrictive conditions on the
client terminal’s capabilities nor does it
create the need for special setting or
agreements between client and server.
 Preserves the transmission bandwidth
requirements and the compatibility with the
existing standard-based voice applications.
REMOTE RECOGNITION: KEY ISSUES
 GSM environment problems for ASR:
 Noisy Scenarios:
• Calls made from almost anywhere: public

places, within a car, at the roadside, etc.
• ASR should be robust.
 Influence of Coding Distortion on Speech

Recognition:
• Quantization error of the parameters to be
transmitted.
• Relevant characteristics of the (encoded and)
decoded speech signal will be different from
those of the original one.
REMOTE RECOGNITION: KEY ISSUES
 Transmission Errors and Lost Frames:

• Channel coding influences the quality of
speech.
• GSM holes.
RECOGNIZING GSM SPEECH
DIFFERENCE BETWEEN
CONVENTIONAL AND IMPROVED
FRONT-ENDS
 The way from which the feature vectors are
derived from the source.
 “Decoded Speech Front-end” starts from the
decoded speech.
 “GSM Digital Front-end” starts from a
(quantized) LP spectrum plus a reduced set
of parameters extracted from the GSM bit
stream.
CONVENTIONAL DECODED SPEECH
FRONT-END:
 The feature extraction is carried out on the

decoded speech signal.
 It is analyzed once every 10 ms employing a
25 ms analysis Hamming window.
 Twelve Mel-frequency cepstral coefficients
(MFCC) are obtained using a mel-scaled filter
bank with 40 channels.
 The log-energy, the twelve delta-cepstral
coefficients and the delta-log energy are
appended, making a total vector dimension
of 26.
DISADVANTAGE:
 The decoded speech is affected by both the

quantization distortion of every parameter
involved in the speech synthesis and the
inadequacies of the source-filter model.
GSM DIGITAL FRONT-END
 The purpose is to reduce the influence of
coding distortion on ASR systems
performance.
 Steps Involved:
• For each GSM frame, P available quantized

spectral parameters are extracted from the
bit stream.
• After the standard concealment procedures
as recommended in the GSM standard, they
are converted into P LP coefficients ar.
 Next, a 512-point spectral envelope, H(Ω), of
the speech frame is computed from the P LP
coefficients.
 A filter bank composed of M mel-scale

symmetrical triangular bands identical to the
one employed in the conventional front-end
is applied to weight │H[k]│, yielding 40
coefficients, which are subsequently
converted into 12 mel cepstrum coefficients
using a discrete cosine transform (DCT) of
their log-scaled magnitudes.
 The frame energy that is extracted from the
bit stream and the log-energy is appended to
the feature vector.
 A band-limited interpolation FIR filter is used
to reduce the frame period from 20 ms to 10
ms (employed by a conventional front-end).
 Finally, dynamic parameters are computed
for all the 12 MFCC and the log-energy,
making a total vector dimension of 26.
WHY USE AUTOMATIC SPEECH
RECOGNITION IN GSM?
 Audio conferencing, echo
cancellation, speech recognition, and hands-
free telephony.
 In deaf telephony such as spinvox voice-to-
text voicemail.
 Captioned telephone.
 Telecommunications Relay Service.

CONCLUSION
 The coding distortion can severely affect the
recognition figures and the more complex the
ASR task is, the bigger the recognition losses
become.
 Using GSM digital front-end, the influence of
some sources of distortion pertaining the
encoding-decoding process is circumvented.

Recognizing GSM Digital Speech: Presented by Amarnath Shenoy K. 4so05ec004

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Recognizing GSM Digital Speech: Presented by Amarnath Shenoy K. 4so05ec004

Uploaded by

Copyright:

Available Formats

RECOGNIZING GSM DIGITAL

 Recognizing GSM Speech

 Alternative Architectures for ASR in the GSM

• The ASR application should reside in the user

• Distortions degrade the performance of the

 Compromise between the two previous

• A standardized front-end is needed so that

• Calls made from almost anywhere: public

 Influence of Coding Distortion on Speech

 Transmission Errors and Lost Frames:

 The feature extraction is carried out on the

 The decoded speech is affected by both the

• For each GSM frame, P available quantized

 A filter bank composed of M mel-scale

 Telecommunications Relay Service.

You might also like