Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 23

RECOGNIZING GSM DIGITAL

SPEECH
Presented by
Amarnath Shenoy K.
4so05ec004
CONTENTS
 GSM
 Speech recognition in the GSM environment

 Remote Recognition

 Recognizing GSM Speech

 Uses

 Conclusion
 Cellular Network
• Macro Cells
• Micro Cells
• Pico Cells
• Femto Cells
• Umbrella Cells
 Speech Coding
• GSM uses Pulse Coded Modulation (64kbps)
• Full-Rate speech codec to remove the
redundancy in the signal and achieve a bit
rate of 13 kbps.

 Channel Coding
• Convolution Coding: Extra bits are added to
the bit stream so that the receiver can
recognize and correct errors due to
transmission.
GSM
 Interleaving
• Aim is to avoid the risk of losing consecutive
data bits.

 Multiple Access
• Allows many users to use their cell phones at
the same time and share the limited
bandwidth.
• FDMA divides the spectrum into small slices
and then TDMA separates each frequency
slices in time into any block.
GSM
 Ciphering
• It is used to encrypt the data so that no one
can overhear the conversation of another
user.

 Modulation
• Modulation used in GSM is Gaussian
minimum-shift keying (GMSK)
SPEECH RECOGNITION IN THE GSM
ENVIRONMENT

 Alternative Architectures for ASR in the GSM


Environment
• Local recognition (client-only processing)
• Remote recognition (server-only processing)
• Distributed recognition (client-server
processing)
LOCAL RECOGNITION
 Performs ASR at the user local terminal.
 Coding distortion and transmission errors is
avoided.
 Down side:

• The ASR application should reside in the user


terminal.
• Increases computational load on local
terminal.
REMOTE RECOGNITION
 Server performs all the recognition process.
 The computational load at the local terminals
is null.
 Mobile terminal does not house the ASR
application.
 Down Side:

• Distortions degrade the performance of the


ASR system.
DISTRIBUTED RECOGNITION

 Compromise between the two previous


solutions.
 Recognition process at the client end
(namely, the parameterization), and the
remaining at the server end.
 Down Side:

• A standardized front-end is needed so that


every client terminal computes and transmits
the same parameters.
REMOTE RECOGNIZING IS MORE
CONVENIENT
 Does not impose restrictive conditions on the
client terminal’s capabilities nor does it
create the need for special setting or
agreements between client and server.
 Preserves the transmission bandwidth
requirements and the compatibility with the
existing standard-based voice applications.
REMOTE RECOGNITION: KEY ISSUES
 GSM environment problems for ASR:
 Noisy Scenarios:

• Calls made from almost anywhere: public


places, within a car, at the roadside, etc.
• ASR should be robust.

 Influence of Coding Distortion on Speech


Recognition:
• Quantization error of the parameters to be
transmitted.
• Relevant characteristics of the (encoded and)
decoded speech signal will be different from
those of the original one.
REMOTE RECOGNITION: KEY ISSUES

 Transmission Errors and Lost Frames:


• Channel coding influences the quality of
speech.
• GSM holes.
RECOGNIZING GSM SPEECH
DIFFERENCE BETWEEN
CONVENTIONAL AND IMPROVED
FRONT-ENDS
 The way from which the feature vectors are
derived from the source.
 “Decoded Speech Front-end” starts from the
decoded speech.
 “GSM Digital Front-end” starts from a
(quantized) LP spectrum plus a reduced set
of parameters extracted from the GSM bit
stream.
CONVENTIONAL DECODED SPEECH
FRONT-END:

 The feature extraction is carried out on the


decoded speech signal.
 It is analyzed once every 10 ms employing a
25 ms analysis Hamming window.
 Twelve Mel-frequency cepstral coefficients
(MFCC) are obtained using a mel-scaled filter
bank with 40 channels.
 The log-energy, the twelve delta-cepstral
coefficients and the delta-log energy are
appended, making a total vector dimension
of 26.
DISADVANTAGE:

 The decoded speech is affected by both the


quantization distortion of every parameter
involved in the speech synthesis and the
inadequacies of the source-filter model.
GSM DIGITAL FRONT-END
 The purpose is to reduce the influence of
coding distortion on ASR systems
performance.
 Steps Involved:

• For each GSM frame, P available quantized


spectral parameters are extracted from the
bit stream.
• After the standard concealment procedures
as recommended in the GSM standard, they
are converted into P LP coefficients ar.
GSM DIGITAL FRONT-END
 Next, a 512-point spectral envelope, H(Ω), of
the speech frame is computed from the P LP
coefficients.

 A filter bank composed of M mel-scale


symmetrical triangular bands identical to the
one employed in the conventional front-end
is applied to weight │H[k]│, yielding 40
coefficients, which are subsequently
converted into 12 mel cepstrum coefficients
using a discrete cosine transform (DCT) of
their log-scaled magnitudes.
GSM DIGITAL FRONT-END
 The frame energy that is extracted from the
bit stream and the log-energy is appended to
the feature vector.
 A band-limited interpolation FIR filter is used
to reduce the frame period from 20 ms to 10
ms (employed by a conventional front-end).
 Finally, dynamic parameters are computed
for all the 12 MFCC and the log-energy,
making a total vector dimension of 26.
WHY USE AUTOMATIC SPEECH
RECOGNITION IN GSM?
 Audio conferencing, echo
cancellation, speech recognition, and hands-
free telephony.
 In deaf telephony such as spinvox voice-to-
text voicemail.
 Captioned telephone.

 Telecommunications Relay Service.


CONCLUSION
 The coding distortion can severely affect the
recognition figures and the more complex the
ASR task is, the bigger the recognition losses
become.
 Using GSM digital front-end, the influence of
some sources of distortion pertaining the
encoding-decoding process is circumvented.

You might also like