Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 12

VOICE

VERIFICATION
LITERATURE SURVEY REPORT

SUBMITTEB BY: RANA MUHAMMAD BILAL


COURSE: ADVANCED DIGITAL SYSTEM DESIGN
INSTRUCTOR: DR. REHAN HAFIZ
ABSTRACT

Voice processing is an emerging research area having many


applications in security and automation. Usual Voice processing
systems implement feature extraction, storage and feature
matching techniques to characterize voice sources, store their
particulars and later on match features of sample from claiming
user to his previous record. Most common feature extraction
technique employed in voice processing is Mel Frequency
Cestrum Coefficient (MFCC). For efficient storage and retrieval
many techniques are available like vector quantization, LBG
algorithm (for code book generation) etc. Similarly, a number of
options are available for feature matching like Euclidean
distance & Correlation. This paper describes a top level block
design of Voice Verification System that uses MFCC, LBG &
Euclidean distance. Calculation of MFCC is further detailed to
next level blocks of Hamming Window, FFT, Mel Frequency
Filter Bank, DCT. Then a literature survey is presented for
computation algorithms available for DCT.
TABLE OF CONTENTS

1. DESCRIPTION OF PROJECT

a. OVERVIEW

b. VOICE PROCESSOR

i. DATA PATH

ii. Control Logic

c. VOICE VERIFICATION ALGORITHM

i. To enroll new user

ii. Current User Login

d. PROJECT PARTITIONING

2. APPLIED DCT ALGORITHMS

a. INTRODUCTION TO MFCC FUNCTIONAL BLOCK

b. BRIEF NOTE ON DCT

c. DCT IMPLEMENTATION ALGORITHMS

i. CHEN ET AL ALGORITHM

ii. LEE ALGORITHM

iii. LOEFFLER ALGORITHM

iv. LIU AND CHIU ALGORITHM

d. SUMMARY OF DCT IMPLEMENTATION ALGORITHMS

3. REFERENCES
DESCRIPTION OF PROJECT

a. OVERVIEW

Proposed hardware design for “Voice Verification” includes Voice


Processor, RAM, Microphone, Analog to digital Converter, Liquid Crystal
Display, Keypad, Storage medium (Magnetic disc, tape or Optical Disc) and a
main Controller that manages all these resources to implement our desired
functionality. Interconnect of all these blocks, is depicted in the figure below.

MIC ADC
Controller
LCD
Keypad

RAM

Voice Processor
Storage

When the system starts, 2 options are displayed on LCD, that user selects from
with the help of Keypad. These options are:

1. Current User Login


2. Enrolment Administration

If user selects Enrolment Administration, the controller then presents him with
Add new enrollment or Delete/Modify previous enrolments, features. Our features
of concern, related to this paper are Add new enrolment and Current user login.
When user selects Add new Enrolment Option, the controller then asks him to
enter his user name through keypad and stores it in memory. Afterwards,
controller generates signals to initiate a sequence of operations to capture voice
sample of new user, extract characteristic features (MFCCLBG) from it and
store these features against user name.
Similarly, when a user demands authentication against its user name, the
controller generates necessary sequence of signals, to capture voice sample,
extract characteristic features and match them to previously stored features.

b. VOICE PROCESSOR

Voice Processor is core functional element in this architecture. Design of this


block is sectioned in data path & control logic, each of which is discussed
below separately.

i. DATA PATH
DATA path includes Floating Point Unit (FPU), RAM, Two Registers,
a Tri-State Buffer and a Bus connecting Data Pins of Ram to inputs of
A,B Registers and output of Tri State Buffer. FPU only supports
multiply and add operations. Operands of FPU are outputs of A, B
Registers. Output of FPU is transferred and stored in RAM through
bus by enabling Tri-State Buffer. Entire operation of Data Path is
dictated through a control word (or Instruction) that is generated by
control logic and includes concerned RAM address, Read/Write Signal
of RAM, Tri-State Buffer’s Enable, A & B Register’s Load &
operation code for FPU.

ii. Control Logic


Control Logic is further sectioned in three functional units namely
MFCC, LBG and EUCLIDEAN CALCULATOR. All Functional unit
implement their respective functions by generating sequence of
appropriate control words that processes data from RAM, in FPU and
stores results again in RAM. Authority to generate Control word (or
control data path) is granted to desired unit by selection from a
Multiplexer, which is operated by Main Controller. Main Controller
also issues flag signal “Start” to desired function and receives status
signal “Done”.
In brief, Main Controller implements User Interface (with help of LCD
and Keypad) and manages sequence of operation of its three subunits
MFCC, LBG and EUCLIDEAN DISTANCE. It also manages
operation of external functional unit ADC, which samples voice
through Microphone and stores it in RAM.
MFCC, LBG and EUCLIDEAN DISTANCE perform their respective
operation when instructed and authority over data pass given from
Main Controller.
START DONE START DONE SELECT

MUX
MFCC LBG

FPU CONTROL
WORD

START EUCLIDEAN A B
DISTANCE
DONE CALCULATOR BUFFER

Scratch Pad-1 Scratch Pad-2 Scratch Pad-3

STORAGE

SAMPLE MEMORY
c. VOICE VERIFICATION ALGORITHM

Hardware described above is versatile enough to house a range of Voice


processing algorithm. Brief description of intended algorithm is as follow.

iii. To enroll new user


Voice Sample is captured. Sample is sliced on time axis and each slice
is passed through a hamming window. Each Windowed sample is
Fourier transformed. Transformed Magnitude Spectrum is squared to
estimate power. Power Spectrum is passed through Mel Frequency
Banks to simulate Human Hearing Characteristics. Output of Mel
Frequency Banks is mapped on Log Scale and Discrete Cosine
Transformed to generate Mel Frequency Septrum Coefficients
(MFCC). Linde, Buzo, and Gray (LBG) Algorithm is applied to
calculated MFCC to determine a region around sample MFCC, where
other sample MFCCs from this user may lie. Coordinates of this
Region, termed as Sample Finger Print are stored in memory against
this user.

iv. Current User Login


User is asked for a voice sample, sequence of operations described
above is carried out to calculate finger print. Euclidean Distance
between both finger prints is calculated and compared to threshold. If
Calculated distance is les than the set threshold then the user is
authenticated else not.

d. PROJECT PARTITIONING

Team working on this project includes three members, Rana Muhammad


Bilal, Waqar Akhter Khan & Mirza Qasim. I, Rana Muhammad Bilal am
to work on DCT functional unit. Mr. Mirza Qasim is to work on FFT and
Mr Waqar Akhter Khan is to work on LBG Algorithm. Next Chapter
describes in detail Literature Review concerning Applied DCT
Algorithms.
APPLIED DCT ALGORITHMS

a. INTRODUCTION TO MFCC FUNCTIONAL BLOCK

MFCC functional block houses a MFCC Controller, which allows to break


overall function to smaller control blocks. These Smaller Control Blocks
implement Window, FFT, Mel Filter Bank & DCT Procedures. Authority
to generate control word is again granted to one of these units using a
Multiplexer. Each functional unit receives trigger signal “Start” from
MFCC Controller and provides status signal “Done” to same.

START DONE

CONTROL WORD
MFCC CONTROLLER

TO MUX
WINDOW FFT

DCT MUX
MEL SPECTRUM

b. BRIEF NOTE ON DCT

Discrete Cosine transform is a mathematical technique similar to Fourier


Transform. It also transforms signal from Time Domain to frequency
domain, however in doing so it only uses real numbers as opposed to
Fourier Transform. Cosine components that are found as result of this
transform are considered more efficient than Fourier Coefficients, as fewer
are needed to approximate a signal. Mathematical equation representing
this operation is:
c. DCT IMPLEMENTATION ALGORITHMS

Listed below are some algorithms that are used for Hardware Computation
of Discrete Cosine Transform.

v. CHEN ET AL ALGORITHM
In this algorithm, if 8 point DCT of input is to be calculated then it can
be written in form of a matrix as

Y=AX

Where X = [x0 x1 x2 x3 x4 x5 x6 x7]T is input signal.


Y = [y0 y1 y2 y3 y4 y5 y6 y7]T is output signal.
and A is Transform Matrix

C4 C4 C4 C4 C4 C4 C4 C4
C1 C3 C5 C5 -C5 -C5 -C3 C1
C2 C5 -C5 -C2 -C2 -C5 C5 C2
C3 -C5 -C1 -C5 C5 C1 C5 -C3
C4 -C4 -C4 C4 C4 -C4 -C4 C4
C5 -C1 C5 C3 -C3 -C5 C1 -C5
C5 -C2 C2 -C5 -C5 C2 -C2 C5
C5 -C5 C3 -C1 C1 -C3 C5 -C5

In which Cn = Cos ( nπ ÷ 16)

Due to symmetry, this matrix can be further broken down into two
matrices of lower order for parallel computation, however since our
proposed architecture only supports serial operation, therefore this 4*4
breakdown is not of interest.

Calculation of y0 using above described algorithm in our architecture


requires 7 additions and one multiplication. This is achieved using
distributed arithmetic’s approach and rewriting equation for y0 as:

y0 = C4 * (x0 + x1 + x2 + x3 + x4 + x5 + x6 + x7)

Similarly, 9 additions and 3 multiplications are required for y1


8 additions and 2 multiplications are required for y2
9 additions and 3 multiplications are required for y3
7 additions and 1 multiplications are required for y4
9 additions and 3 multiplications are required for y5
8 additions and 2 multiplications are required for y6
9 additions and 3 multiplications are required for y7
Overall 66 additions and 18 multiplications are required.

vi. LEE ALGORITHM


This Algorithm is based on even and odd decomposition of signal.
Thus an N point DCT is broken down into two N/2 point DCTs.

This breakdown can be continued until N is an integral power of 2.


This boils down to 13 multiplications and 25 additions

vii. LOEFFLER ALGORITHM

Proposed by Loeffler, this algorithm employs block diagram given


below to calculate 8 point DCT. Using similar techniques from
distributed mathematics as above (i.e. a*b + a*c = a(b +c) ), 11
multiplications and 29 additions are required for calculation of
transformed output.
viii. LIU AND CHIU ALGORITHM

In this approach, no of input samples need to be larger than the


intended DCT sample points. A running DCT of desired length (N) is
calculated and each next DCT is obtained by adding a difference term
to previous DCT. As, our application doesn’t requires a running DCT
and it has on demand processing structure, therefore these
implementations are not feasible.

d. SUMMARY OF DCT IMPLEMENTATION ALGORITHMS

Additions and multiplications required in various DCT Algorithm is


tabulated below.

ALGORITHM ADDITIONS MULTIPLICATION


CHEN 66 18
LEE 25 13
LOEFFLER 29 11

Each of these algorithms is parallelize able and pipeline able to different


extent. However, since only one set of input is to be processed each time,
therefore pipelining is not of interest here. Similarly, single processing
element nature of architecture makes parallelism unimportant. Thus, from
minimum operations perspective of decision, LOEFFLER algorithm is the
best option available for our architecture. However, since this is a HID
(Human Interface Device) and HIDs typically have ample amount of
processing time available, therefore CHEN’s algorithm may be pursued
owing to it’s simplicity and ease of implementation. Final selection
between CHEN & LOEFFLER Algorithms will be made on the basis of
timing information from other functional units of architecture and timing
delays of Multiplier and Adder.
REFERENCES

1. An Efficient Implementation of the 1D DCT using FPGA Technology by


Hassan EL-Banna, Alaa A. EL-Fattah and Waleed Fakhr in 11th IEEE
International Conference and Workshop on the Engineering of
Computer-Based Systems (ECBS’04)

2. Implementation of Loeffler Algorithm on Stratix DSP Compared to


Classical FPGA Solutions by A. Ben Atitallah, P. Kadionik, F. Ghozzi,
P.Nouel, N. Masmoudi, Ph.Marchegay

3. A Comparison of Bit Serial and Bit Parallel DCT Designs by DAVID


CROOK and JOHN FULCHER in VLSI Design 1995, Vol. 3, No. 1, pp. 59-
65

You might also like