Adaptive Algorithm For Speech Compression Using Cosine Packet Transform

International Conference on Intelligent and Advanced Systems 2007
Adaptive Algorithm for Speech Compression using

Cosine Packet Transform
P.Prakasam and M.Madheswaran
Center for Advanced Research, Department of Electronics and Communication Engineering
Muthayammal Engineering College, Rasipuram 637 408, Tamilnadu, India.
Phone: +91 4287 226737, Fax: +91 4287 226537
email: prakasamp@gmail.com, madheswaran.dr@gmail.com
Abstract This paper presents a new adaptive algorithm for
speech Compression using Cosine Packet Transform.
The
proposed algorithm uses packet decomposition, which reduces a
computational complexity of a system. This paper compare the
compression ratio of methods using Wavelet Transform, Cosine
Transform, Wavelet Packet Transform and proposed adaptive
algorithm using Cosine Packet Transform for different speech
signal samples. The mean compression ratio is calculated for all
the methods and compared. The implemented results show that
the proposed compression algorithm gives the better performance
for speech signals.
Keywords: Discrete Cosine Transform, Discrete Wavelet Transform,
Wavelet Packet Transform, Cosine Packets, adaptive thresholding.
I. INTRODUCTION
With rapid deployment of speech compression technologies,
more and more speech content is stored and transmitted in
compressed formats. Speech signals has unique properties that
differ from a general audio/music signals. First, speech is a
signal that is more structured and band-limited around 4 kHz.
These two facts can be exploited through different models and
approaches and at the end, make it easier to compress. Today,
applications of speech compression involve real time
processing in mobile satellite communications, cellular
telephony, internet telephony, audio for videophones or video
teleconferencing systems, among others. Other applications
include also storage and synthesis systems used, for example,
in voice mail systems, voice memo wristwatches, voice logging
recorders and interactive PC software[1]. The idea of speech
compression is to compress speech signal to take up less
storage space and less bandwidth for transmission. To meet this
goal different methods for compression have been designed
and developed by various researchers [2-7]. The speech
compression is used in digital telephony, in multimedia and in
the security of digital communications. Before the introduction
of Packet based transform techniques, audio coding techniques
used DFT and DCT with window functions such as rectangular
and sine-taper functions. However, these early coding
techniques have failed to fulfil the contradictory requirements
imposed by high-quality audio coding. For example, with a
rectangular window the analysis/synthesis system is critically
sampled, i.e., the overall number of the transformed domain
samples is equal to the number of time domain samples, but the
system suffers from poor frequency resolution and block
1168 ~
effects, which are introduced after quantization or other

manipulation in the frequency domain. Overlapped windows
allow for better frequency response functions but carry the
penalty of additional values in the frequency domain, thus not
critically sampled. Discrete Cosine Packet Transform is
currently the best solution, which has satisfactorily solved the
paradox.
Speech compressions are done by either based on linear
prediction or based on orthogonal transforms methods. On the
basis of the classical papers written by Shannon, [8] and
Kolmogorov, [9], recently was highlighted a strong connection
between the systems proposed in many lossy compression
standards and the harmonic analysis, [10]. All these systems
use orthogonal transforms. The algorithm described in this
paper belongs to the second category. Unfortunately there is no
any fast algorithm for the computation of orthogonal transform.
This is the reason why in practice other orthogonal transforms
are used. The quality of compression system can be appreciated
with the aid of his rate distortion function. A compression
system is better than another if, at equal distortions, it realizes a
higher compression rate. The maximization of compression
rate can be done, if a good selection of orthogonal transform be
made.
This paper is organized as follows. The mathematical model
for speech signal and the description about Discrete Cosine
Transform is presented in Section II. With necessary
mathematical modeling, the proposed adaptive algorithm for
speech compression is explained in Section III. In section IV,
the developed algorithm is tested for various speech signal
samples and comparison is made with Wavelet Transform,
Cosine Transform and Wavelet Packet Transform. Finally,
section V concludes the paper with some discussions.
II. MATHEMATICAL MODEL
Mathematical model of speech signal
Every spoken word is a sequence of tons with different
intensities, frequencies and duration. Every ton is a sinusoidal
signal with a certain amplitude, frequency and duration.
Therefore it is possible to represent any speech signal in to a
sinusoidal model. A mathematical description of this model is
given by
1-4244-1355-9/07/$25.00 @2007 IEEE
Authorized licensed use limited to: Jawaharlal Nehru Technological University. Downloaded on November 17, 2008 at 06:25 from IEEE Xplore. Restrictions apply.

Q (t )
x(t )
Ai cos T i (t )
(1)
i 1
Where Ai, i and t are amplitude, frequency and time duration

of the particular incident respectively.
Every term of this sum is a signal with double modulation.
So this is not a stationary signal. But frequently the speech is
regarded like a sequence of stationary signals. Dividing the
speech signal into a sequence of stationary signals, each of
them having duration inferior to 25 ms, a sequence of
stationary signals is obtained. On each segment the speech
model can be of the form:
n
A cos Z t
x s (t )
(2)
i 1
This decomposition is very similar with the decomposition of

the signal xs
tinto a cosine packet.
The energy of the signal xs(t) can be computed using the
following relation.
n
Ex
| A |
i
(3)
i l
The Discrete Cosine Transform
coefficients are extracted and fed into the adaptive threshold

detector to nullify the inferior coefficient for better
compression.
Selection of best packets
The main reason to choose the Packet Cosine transform is
cost functional used for the best packet. This transform is an
adaptive one. The result of its utilization in a given application
can be optimize using the best packet selection procedure. This
is a very efficient procedure which is able to enhance very
much quality of a given signal processing method. There are
some cost functions that can be minimized for the selection of
the best cosine packet. The most used is the entropy but its
utilization do not realizes the maximization of the compression
rate. The optimal cost functional for compression is that
realizing the minimization of the number of coefficients
superior to a given threshold, t, Ci. Using this cost functional,
Ci coefficients superior to the threshold t are obtained. This is a
minimal number because it was obtained using the appropriate
cost functional for the selection of the best packet. This is the
reason why this cost functional realizes the maximization of
the compression rate. Increasing the threshold value t, the
number Ci decreases and the compression rate increases.
Hence, the threshold detector must be an adaptive one. Another
parameter of the DCPT who must be considered for the
optimization of the compression is its number of iterations.
The most common DCT [11] definition of a 1-D sequence of

length N is
Ci D i
N 1
S (2 x 1)i
,
2 N
f ( x) cos
x 0
(4)
for i = 0,1,2,,N1.
Where
Di
Input Speech signal to be

compressed
Packet
Decomposition
2
N
for i 0
(5)
for i z 0
Computation of
DCT
It is clear from (1) that for i =0,
C (i
0)
1
N
N 1
f ( x)
Extracting the
coefficients (Ci)
x 0
Thus, the first transform coefficient is the average value of

the sample sequence. In literature, this value is referred to as
the DC Coefficient. All other transform coefficients are called
the AC Coefficients.
Adaptive Threshold
Detector
III. PROPOSED ALGORITHM

The proposed adaptive algorithm for speech compression
using Cosine Packet Transform is shown in Fig 1. The speech
signal to be compressed is converted in to packets with finite
duration. The Discrete Cosine Transform is applied to each
packet and transformed coefficients are computed. The
Compressed Speech Signal

Fig 1. Flow diagram for the proposed adaptive algorithm
~ 1169

Adaptive Threshold Detector
One of the most important processes of the proposed
compression algorithm is the threshold detector. The main role
of this process is to nullify all the coefficients obtained from
the Cosine Packet Transform smaller to a threshold value. This
is in fact the compression mechanism. This process is an
adaptive system, which automatically choose the threshold
value depending upon the transform coefficient value and
repeat the process for a certain condition.
Let us assume that the distortion parameter of a compression
system is a, a<1, N is the number of samples of signal to be
processed and Ex is the energy of the input speech signal, then
the threshold value is defined as
a . Ex
N
Coefficients from
DCT (Ci)
Compute the energy of

the input signal (Ex)
Initialize
b=-10log (a), a<1
Compute the
Threshold (t)
(7)
The constant a can be related with the signal to noise ratio of

the input signal x(t) and is defined as
b 10. log10 a
(8)
From the above equation a is given by
an
b
10
10
tn
Ex
N
YES
STOP
If
t > Ci
(9)
where an is nothing but the lower bound of a.

Using eqns (7) & (9) the lower bound value for the
threshold can be obtained as
b
10
10 .
YES
If
Ex > b
(10)
For the threshold a value t, superior to tn, an output signal to

noise ratio superior to b will be obtained. Unfortunately the
exact value of Ex will not be known a priori. This is the reason
why an adaptive algorithm for the election of the threshold
value is recommended. This algorithm can use the value tn
(obtained in the last relation) for initialization.
The flow diagram of adaptive threshold detector is shown in
Fig 2. The energy of the input signal to be compressed is
computed and the value of b is initialized. The threshold value
is calculated using eqn 10. The threshold value is increased
starting from this value. At every iteration the value Ex is
computed. If this value is higher than b then the extracted
coefficient is compared with threshold value t. If it is less then
the threshold value then the corresponding coefficients is
replaced with zero value otherwise the coefficients value is
maintained the same. The proposed adaptive algorithm is
stopped when for the first time the value Ex becomes smaller
than b.
Ci= Ci
Ci= 0
Compute the Energy (Ex)
Compute the New

Threshold t = t + 0.1
Fig 2. Flow diagram for the adaptive threshold detector
IV. SIMULATION RESULTS AND DISCUSSION

The various speech signal sample is simulated using
MATLAB. The generated speech signal sample is shown in Fig
3. The generated speech signal is segmented in to 15 packets
with 512 samples (the duration of each block being inferior to
25 ms) per packet. The Discrete Cosine Transform is computed
for each packets using eqn 4. The transformed coefficients are
extracted for further processing. The energy of the input signal
is computed and the threshold value is calculated using eqn 10.
The value of input energy is compared with b. If it is higher
then each and every transformed coefficient value is compared
with threshold value. The inferior coefficients are nullified.
The new energy of the signal is calculated and compared with
b. If energy is lower than b the above process is repeated for
1170 ~

new threshold value otherwise the compression process is
stopped.
on the 20th sample. The proposed algorithm gives the better

compression ratio for most he the speech samples. The
comparison of compression ratio for speech signal sample from
1 to 10 and from 11 to 20 is plotted as shown in Fig 4 and 5
respectively for easy understanding.
Comparison of Compression ratio
60
DWT
Compression Ratio
50
DCT
40
WPT
30
Proposed
Algorithm
20
Fig 3. Speech Signal Sample
10
For 20 different speech signals, compression is performed

using Discrete Cosine Transform, Discrete Wavelet Transform,
Wavelet Packet Transform and the proposed adaptive
algorithm. The compression ratios achieved through these
methods are tabulated for various speech signal sample.
0
1
DCT
WPT
6.1229
6.1462
6.1462
6.1473
6.1452
6.1482
6.1473
6.1482
6.1482
6.1482
6.1479
6.1337
6.1477
6.1272
6.1461
6.1468
6.1461
6.1482
6.1452
6.1443
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
11.8444
12.1766
12.4397
12.3433
42.1520
51.5191
23.8063
40.9006
26.5968
35.1952
19.9917
21.5817
13.8164
30.9609
15.5718
30.4582
22.4461
29.2490
26.8928
60.5990
Proposed
Adaptive
algorithm
11.7985
12.2632
11.1433
12.8633
45.8856
55.7188
23.9820
43.0466
28.5052
36.0922
20.7104
21.7237
13.9029
31.1392
15.8481
31.6369
23.2086
30.8083
27.1709
68.0507
The Table I shows the comparison of compression ratio for

various methods. Analyzing the Table, the good performance
of the proposed adaptive algorithm can be observed. The
smallest compression rate, 11.1433, was obtained on the 3rd
sample and the better compression rate, 68.0507, was obtained
10
Comparison of Compression Ratio

70
DWT
60
Compression Ratio
DWT
4
5
6
7
Speech Signal Sam ple
Fig 4. Comparison of Compression ratio (Speech signal sample 1-10)
TABLE I COMPARISON OF COMPRESSION RATIO

Speech
Signal
Sample
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
DCT
50
WPT
40
Proposed
Algorithm
30
20
10
0
1
10
Speech Signal Sample
Fig 5. Comparison of Compression ratio (Speech signal sample 11-20
The analysis from the above figures show that out of 20

signal sample only 2 samples have a less compression ratio as
compared with WPT method and high as compared with other
two methods. The mean compression ratio for all the methods
are computed and tabulated in Table II.
TABLE II MEAN COMPRESSION RATIO
DWT
DCT
WPT
Proposed
Adaptive
algorithm
6.144
6.142
27.027
28.275
~ 1171

The analysis form Table II shows that the mean compression
ratio for 20 samples was achieved using the proposed adaptive
algorithm is 28.275. This is a sufficiently high value, taking
into account the fact that any lossless compression method was
not used.
V. CONCLUSION
A new compression method based on adaptive threshold
detector is proposed and tested. The simulated results show that
the proposed algorithm gives the better compression ratio as
compared with other methods. Using this method, a mean
compression rate of 28.275, was obtained in the simulation
report. This value is superior to mean compression rate, of
other methods. Using fast DCT algorithm, the proposed
method can be implemented on a Digital Signal Processor. The
proposed system is a good alternative to the speech
compression systems based on the linear prediction
approaches.
REFERENCES
[1]. R. W. Yeung, A First Course in Information Theory, New York:
Kluwer Academic/Plenum Publishers, 2002.
[2]. A.Gersho, Advances in Speech and Video Compressions,
Proceedings of the IEEE, vol. 82, pp. 900-918, June 1994.
[3]. J.L.Flanagaran,
M.R.Schroeder,
B.S.Atal,
R.E.Crocherie,
N.S.Jayant and J.M.Tribolet, Speech Coding, IEEE Transactions
on Communications, vol. 27, pp.710-737, April 1979.
[4]. P.Noll, Wideband Speech and Audio Coding, IEEE
Communications Magazine, pp. 34-44, Nov. 1993.
[5]. K. Sayood and J. C. Borkenhagen, Use of residual redundancy in
the design of joint source/channel coders, IEEE Transactions on
Communications, 39(6):838-846, June 1991.
[6]. Edler, B., Coding of Audio Signals with Overlapping Block
Transform and Adaptive Window Functions, (in German),
Frequenz, vol.43, pp.252-256, 1989.
[7]. Q. Memon, T. Kasparis, Transform Coding of Signals Using
Approximate Trigonometric Expansions. Journal of Electronic
Imaging, Vol. 6, No. 4, October 1997, pp. 494-503.
[8]. C. E. Shannon, .A mathematical theory of communications,. Bell
System Technical Journal, vol. 27, pp. 379.423, 623.656, 1948.
[9]. A. N. Kolmogorov, .On the Shannon theory of information
transmission in the case of continuous signals,. Trans. IRE, vol. IT2, pp. 102.108, 1956.
[10]. D. L. Donoho, M. Vetterli, R. A. Devore, and I. Daubechies, .Data
compression and harmonic analysis,. IEEE Trans. Inf. Theory, vol.
44, no. 6, pp. 2435.2476, 1998.
[11]. N. Ahmed, T. Natarajan, and K. R. Rao, Discrete cosine
transform, IEEE Transactions on Computers, vol. C-32, pp. 90-93,
Jan. 1974.
1172 ~

Adaptive Algorithm For Speech Compression Using Cosine Packet Transform

Uploaded by

Copyright:

Available Formats

You might also like

Adaptive Algorithm For Speech Compression Using Cosine Packet Transform

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adaptive Algorithm For Speech Compression Using Cosine Packet Transform

Uploaded by

Copyright:

Available Formats

International Conference on Intelligent and Advanced Systems 2007