Ijetae 0612 54 PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)

Speech Compression with Voice Excited Linear


Predictive Coding
1

Arpana Mishra , 2Javed Ashraf

M. tech. Scholar, AFSET Faridabad


Assistant Professor, AFSET Faridabad

The digital filter and its slow changing parameters are


usually encoded to achieve compression from the speech
signal .There are many other characteristics about speech
production that can be exploited by speech coding
algorithms. One fact that is often used is that period of
silence take up greater than 50% of conversations. An easy
way to save bandwidth and reduce the amount of
information needed to represent the speech signal is to not
transmit the silence.
All vocoders, including LPC vocoders, have four main
attributes: bit rate, delay, complexity, quality. Any voice
coder, regardless of the algorithm it uses, will have to make
trade offs between these different attributes.
First attribute of vocoders the bit rate, is used to
determine the degree of compression that a vocoder
achieves. Uncompressed speech is usually transmitted at 64
kb/s using 8 bits/sample and a rate of 8 kHz for sampling.
Any bit rate below 64 kb/s is considered compression. The
linear predictive coder transmits speech at a bit rate of 2.4
kb/s, an excellent rate of compression.
Delay is another important attribute for vocoders that are
involved with the transmission of an encoded speech
signal. Vocoders which are involved with the storage of the
compressed speech, as opposed to transmission, are not as
concern with delay. The general delay standard for
transmitted speech conversations is that any delay that is
greater than 300 ms is considered unacceptable.
The third attribute of voice coders is the complexity of
the algorithm used. The complexity affects both the cost
and the power of the vocoder. Linear predictive coding
because of its high compression rate is very complex and
involves executing millions of instructions per second. LPC
often requires more than one processor to run in real time.
The final attribute of vocoders is quality. Quality is a
subjective attribute and it depends on how the speech

Abstractthe aim of the project to develop a system for


encoding good quality speech at a low bit rate .To implement
this we have used very efficient speech analysis technique
Linear Predictive Coding (LPC). It provides accurate
estimation of speech parameters. An alternate explanation is
that the linear prediction filters attempt to predict future
values of the input signal based on past signals. The speech
signals of males and females were coded. The encoding
process of LPC involves determining a set of accurate
parameters for modeling the vocal tract during the
production of a given speech signal. Decoding involves using
the parameters acquired in the encoding and analysis to build
a synthesized version of the original speech signal. The
conclusion indicates that project was successful in coding the
speech signal at relatively low bit rates with good quality.
KeywordsACR, CODER, LPC, RESIDUAL SIGNAL,
SEGSNR, VOCODER
I.

INTRODUCTION

Speech coding has been and still is a major issue in area


the of digital speech processing .There exist many different
types of speech compression that make use of a variety of
different techniques. However, most methods of speech
compression exploit the fact that speech production occurs
through slow anatomical movements and that the speech
produced has a limited frequency range. The frequency of
human speech production ranges from around 300 Hz to
3400 Hz. Most forms of speech coding are usually based on
a lossy algorithm. Lossy algorithms are considered
acceptable when encoding speech because the loss of
quality is often undetectable to the human ear.
Another fact about speech production that can be taken
advantage of is that mechanically there is a high correlation
between adjacent samples of speech. Most forms of speech
compression are achieved by modeling the process of
speech production as a linear digital filter.
306

International Journal of Emerging Technology and Advanced Engineering


Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)
sounds to a given listener. One of the most common test for
speech quality is the absolute category rating (ACR) test.
This test involves subjects being given pairs of sentences
and asked to rate them as excellent, good, fair, poor, or bad.
The speech coder that is developed is analyzed using
both subjective and objective analysis. Subjective analysis
will consist of listening to the encoded speech Signal and
making judgments on its quality. The quality of played
back speech will be solely based on the opinion of the
listener. An objective analysis will be performed by
computing Segmental Signal to Noise Ratio (SEGNER)
between the original and the coded speech signal.The
report will be conducted with the summary and some ideas
for future work.

It is based on the idea of separating the source from the


filter in the production of sound. This model is used in both
the encoding and the decoding of LPC and is derived from
a mathematical approximation of the vocal tract
represented as a varying diameter tube. The excitation of
the air travelling through the vocal tract is the source. This
air can be periodic, when producing voiced sounds through
vibrating vocal cords, or it can be turbulent and random
when producing unvoiced sounds. The encoding process of
LPC involves determining a set of accurate parameters for
modeling the vocal tract during the production of a given
speech signal. Decoding involves using the parameters
acquired in the encoding and analysis to build a synthesized
version of the original speech signal. LPC never transmits
any estimates of speech to the receiver, it only sends the
model to produce the speech and some indications about
what type of sound is being produced.

II. HUMAN SPEECH PRODUCTION


The process of speech production in humans can be
summarized as air being pushed from the lungs through the
vocal tract, and out through the mouth to generate speech.
In this type of description the lungs can be thought of as the
source of the sound and the vocal tract can be thought of as
a filter that produces the various types of sounds that make
up speech.

FIGURE 1.PATH OF HUMAN SPEECH PRODUCTION

The idea of the air from the lungs as a source and the
vocal tract as a filter is called the source-filter model for
sound production. The source-filter model is the model that
is used in linear predictive coding.

FIGURE 2.HUMAN VS.VOICE CODER SPEECH PRODUCTION

307

International Journal of Emerging Technology and Advanced Engineering


Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)
III. LPC MODEL
The particular source-filter model used in LPC is known
as the Linear predictive coding model. It has two key
components:
analysis or encoding and synthesis or
decoding. The analysis part of LPC involves examining the
speech signal and breaking it down into segments or
blocks.
The principle behind the use of LPC is to minimize the
sum of the squared differences between the original speech
signal and the estimated speech signal over a finite
duration. This could be used to give a unique set of
predictor coefficients. these predictor coefficients are
normally estimated every frame, which is normally 20ms
long. The predictor coefficients are represented by a k.
Another important parameter is gain (G).The transfer
function of the time-varying digital filter is given by:

FIGURE 4.BLOCK DIAGRAM OF A VOICE EXCITED LPC


VOCODER

The main idea behind the voice-excitation is to avoid


the imprecise detection of the pitch and the use of an
impulse train while synthesizing the speech. Thus the Input
speech signal in each frame is filtered with the estimated
transfer function of LPC analyzer. The filtered signal is
called residual. If this signal is transmitted to the receiver
one can achieve a very good quality. For a good
reconstruction of the excitation only the low frequencies of
the residual signals are coded. To achieve a high
compression rate we employed the Discrete cosine
transform (DCT) of the residual signal. Thus one way to
compress the signal is to transfer only the coefficients,
which contain most of the energy.
V. COMPARATIVE ANALYSIS OF LPC METHOD
A compression of the original sentences against the LPC
reconstructed and the voice-excited LPC method is done. In
both cases, the reconstructed speech has a lower Quality
then the input speech sentences. The LPC reconstructed
speech has a lower pitch than the original sound. The sound
seems to be whispered. The voice-excited LPC
reconstructed file sounds more spoken and less whispered.

FIGURE 3.BLOCK DIAGRAM OF AN LPC VOCODER

IV. QUANTIZATION OF LPC COEFFICIENTS


Usually direct quantization of the predictor coefficients
is not considered. To ensure stability of the coefficients
(the pole and zeros must lie within the unit circle in the zplane) a relatively high accuracy (8-10 bits per coefficients)
is required. This comes from the effect that small changes
in the predictor coefficients lead to relatively large changes
in the pole positions. These are intermediate values during
the calculation of the well-known Levinson-Durbin
recursion. Quantizing the intermediate values is less
problematic than quantifying the predictor coefficients
directly.

A. Power Signal to Noise Ratio

Where A is samples of original signal and MSE is Mean


Square Error is given by,

308

International Journal of Emerging Technology and Advanced Engineering


Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)
VI. WAVEFORMS ANALYSIS

VII. CONCLUSION

Waveforms of LPC reconstructed, voice-excited LPC


reconstructed speech signals give the idea of quality of
signals.

We can see that the voice excited waveform looks closer


to the original sound than the plain LPC reconstructed one.
Voice discrimination tests indicate that voice identity is
well preserved. Crucial factors influencing the remade
speech quality are accuracy of spectral flattening and the
impulse response of the analyzer low pass filters. Though
there is an improvement in quality when we use voice
excited method, the bits per sample increase causing an
increase in Bandwidth of signal. But at same time when
SNR for both cases were compared it was observed that the
sound due to Plain LPC was found to be more noisy
having a negative SNR. The voice excited LPC having a
positive SNR.

A. Original Speech Signal

REFERENCES
FIGURE 5.WAVEFORM OF ORIGINAL SPEECH SIGNAL

[1] L R Rabiner and R W Schafer , Digital Processing Of Speech


Signal Prentice-Hall, Eaglewood Cliffs,NJ,1978

B. LPC reconstructed Speech Signal

[2]

Speech coding overview Jason


mobile.ecs.soton.ac.uk/speech_codecs/

Woodard,

http://www-

[3] Richard V.Cox Speech Coding(1999),chapter9,1-9


[4] Mark Nelson and Jean-loup Gailly,Speech compression.
[5] Landy Goldbarg and Lance Reik , A practical handbook of Speech
coding chapter 4,1-4

[6] Jerry. D. Gibson. Media Signal Processing.


[7] Study of Linear Prediction Model for Audio Synthesis, Anjini
Shukla ,

[8] VoiceExcited Vocoders for practical Speech , by. E.E. David , Jr.

FIGURE 6.WAVEFORM OF LPC RECONSTRUCTED SPEECH


SIGNAL

M.R. Schroeder, B.F. Logan

[9] en.wikipedia.org/wiki/linear_predictive_coding

C. Voice Excited LPC reconstructed Speech Signal

[10] Roger M. Golden, Digital Computer Simulation of a SampledData Voice-Excited Vocoder Journal of the acoustical society of
America,volume35,issue9e

AcouJournal of the
[11] Voice Exciting LPC ,Taguchi ,Deptartment of Electronics. Eng.,
Saskatchewan Univ., Saskatoon, Sask., conference Publications

[12] www.data-compression.com
[13]www.acostic,hut.fi/publications/file/theses/lemetty_mst/chap3.html
FIGURE 7.WAVEFORM OF VOICE EXCITED LPC
RECONSTRUCTED SPEECH SIGNAL

[14] V. Hardman and O. Hodson. Internet / Mbone Audio(2000)5-7

309

You might also like