Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

3.

SOPC-Based Word Recognition System

ABSTRACT:

Real-world processes generally produce observable outputs, which can be characterized as


signals. The signals can be discrete in nature (e.g., characters from a finite alphabet,
quantized vectors from a codebook), or continuous in nature (e.g., speech samples,
temperature measurements, music). The signal source can be stationary (i.e., its statistical
properties do not vary with time), or non-stationary (i.e., the signal properties vary over
time). The signals can be pure (i.e., coming strictly from a single source), or can be corrupted
from other signal sources (e.g., noise) or by transmission distortions, reverberation, etc.

A problem of fundamental interest is characterizing such real-world signals in terms of signal


models. There are several reasons to be interested in applying signal models. First, a signal
model can provide the basis for a theoretical description of a signal processing system,
which processes the signal to provide a desired output.A second reason why signal models
are important is that they are potentially capable of letting us learn a great deal about the
signal source (that is, the real-world process that produced the signal) without having the
source available. This property is especially important when the cost of getting signals from
the actual source is high. In this case, with a good signal model, we can simulate the source
and learn as
much as possible via simulations.Finally, signal models are important because they often
work extremely well in practice, and enable us to realize important practical systems (such
as prediction systems, recognition systems, identification systems, etc.) in a very efficient
manner. There are several possible choices for the type of signal model used for
characterizing the properties of a given signal. Broadly, one can dichotomize the types of
signal models into the class of deterministic models and the class of statistical models.
Deterministic models generally exploit some known specific properties of the signal (e.g.,
that the signal is a sine wave or a sum of exponentials). In these cases, specification of the
signal model is generally straightforward; all that is required is to determine (estimate) the
values of the signal model parameters (e.g., amplitude, frequency, phase of a sine wave,
amplitudes and rates of exponentials, etc.). The second broad class of signal models is the
set of statistical models in which one tries to characterize only the statistical properties of
the signal. The underlying assumption of the statistical model is that the signal is
characterized as a parametric random process, and that the parameters of the stochastic
process can be determined (estimated) in a precise, well-defined manner. The Hidden
Markov Model (HMM) falls in this second category.In simple terms, a HMM is a model used
to create another model about which we know nothing except its output sequence.
The HMM is trained to produce an output that closely matches the available output
sequence, and can be assumed to model the unknown model with sufficient
accuracy.Speech recognition systems have been developed for many real-world
applications, often using low-cost speech recognition software. However high-performance
and robust isolated word recognition, particularly for digits, is still useful for many
applications, such as recognizing telephone numbers for use by physically challenged
persons. This formed the motivation for taking up this project. Efficient implementation of a
complete system on a programmable chip (SOPC) got an impetus with the advent of high-
density FPGAs integrated with high-capacity RAMs and the availability of implementation
support for soft-core processors such as the Nios II processor. FPGAs enable the best of both
worlds to be used gainfully for an application—the microcontroller or RISC processor is
efficient for performing control and decision-making operations, while the FPGA efficiently
performs digital signal processing (DSP) operations and other computation intensive tasks.
We aim to produce an efficient hardware speech recognition system with an FPGA acting as
a coprocessor that is capable of performing recognition at a much faster rate than software.
Implementation of systems using an Altera-based system on a programmable chip enables
time-critical functions to be implemented on hardware synthesized with VHDL/Verilog HDL
code. The soft-core Nios II processor that is part of the FPGA can execute the programs,
written in a high-level language. Custom instructions enable the feasibility of implementing
the whole system on an FPGA with better partitioning of the software and hardware
implementation of the speech recognition system. Our project aims at developing a HMM-
based speech recognition system with a vocabulary of 10 digits(digits zero to nine). We
trained the system for three users for all the mentioned digits with are cognition accuracy of
nearly 100%. Energy and zero crossings-based voice activity detection (VAD) was used for
segmentation of the input samples and removing background noise. We used linear
predictive coding (LPC-10) analysis, followed by cepstral analysis for feature vector
extraction from speech frames. HMM was used for training the speech models and Viterbi
decoding for recognition.Vector quantization (VQ) was used for reducing the memory
requirement. We used direct parameter averaging of the HMM parameters during training,
which has several advantages over Rabiner’sapproach, such as a lower data requirement,
higher detection accuracy, and less computation complexity. We implemented the feature
extraction, training, and other preprocessing stages of HMM in software (C++/MATLAB) in
the offline mode and the recognition process, including floating-point multiplication
operation of the Viterbi decoding process, as custom hardware in hardware implementation
and as an online process.

You might also like