Odd Even Vector Sorter

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2017 6th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Recursive Odd-Even Sorter for Vector Quantizer


Berkin Atila and Burak Kelleci
College of Engineering
Okan University
AkfiratfTuzlalIstanbul/TURKEY, 34959
Email: burak.kelleci@okan.edu.tr

Abstract-A recursive , odd-even transposition sorter based 3 GHZ, and this results in significant power consumption and
vector quantizer which is used in mismatch shaping algorithms is state-of-the-art technology requirement.
presented. Although recursive parallel sorting algorithms require
Fully parallel sorter architecture is well-known in the liter
less area than fully parallel sorting algorithms, they are slower
than fully parallel algorithms. A widely used recursive parallel
ature. The fastest fully parallel sorter is Bitonic and Batcher
sorting algorithm is the perfect shuffle which requires multiple odd-even sorter [4], [5]. However, the area requirement of
clock cycles to shuffle and sort the data. The proposed recursive these sorting algorithms limits their usage to high performance
algorithm uses fewer clock cycles than the perfect shuffle to sort but high cost systems. A solution to this problem is using
less than 80 inputs. An area efficient version is also proposed to
recursive sorting algorithms, such as the perfect shuffle al
sort less than 16 inputs faster than perfect shuffle algorithm. To
compare the performance of various sorting algorithms suitable gorithm [6]. However, it is slower than fully parallel sorters.
for vector quantizer, they are realized and synthesized in TSMC Moreover, the number of clock cycles required to sort also
40nm low-power technology. Speed and area results indicate that increases with the number of inputs. Therefore, its usage is
the proposed algorithm sorts 32 inputs at a 42% faster rate by limited to low sample rate systems with few sorting elements.
using 14% fewer components than the perfect shuffle sorter and a
For example, the sample rate to sort 32 input elements is 2.4
80% slower rate by using 27% fewer components than the Bitonic
sorter. The area efficient version sorts 32 inputs at a 21% slower
MSamplels in 0.35 f..Lm technology. However, with the advance
rate by using 32% fewer components than the perfect shuffle of nanometer technologies, the speed of recursive algorithms
sorter. is high enough to be used in practical applications, such as
Index Terms-Vector Quantizer, Sorting Algorithm , Mismatch the audio DAC in [2].
Shaping , Odd-Even Sorter, Recursive Sorter, Perfect Shuffle In this paper, a recursive sorting algorithm which requires
Sorter.
smaller area than fully parallel designs and operates at higher
speed than perfect shuffle algorithms to sort up to 80 elements
I. INTRODUCTION is proposed to be used in a vector quantizer. An area efficient
version is also proposed and it is faster than the perfect
The low jitter sensitivity and high signal-to-noise require shuffle algorithm to sort up to 16 elements. The rest of
ments of sigma-delta digital-to-analog converters (DAC) are the paper is organized as follows. The review of current
achieved by multi-bit converters. However, they suffer from parallel and recursive sorting algorithms is given in section II.
static and dynamic mismatches among the unit current sources. The proposed design is presented in section III. Performance
In the literature, there are various techniques to mitigate comparison of proposed design is reported in section IV and
the effect of static and dynamic mismatches [1], [2]. These concluding remarks are given in section V.
algorithms track the usage information of current sources and
II. REVIEW OF P ARALLEL SORTING ALGORIT HMS
select the least used ones. The selection process is performed
using a vector quantizer which sorts the usage information of The fundamental component of a sorter is the compare and
individual current sources to determine the current sources that swap operation as shown in Fig. lao Two inputs are compared
should be turned on depending on the input signal requirement. and the minimum of two inputs is routed to the top output and
The vector quantizer is the area hungry block of modern the maximum of two inputs is routed to the bottom output. To
mismatch shaping algorithms due to the sorting operation. This simplify the analysis the compare and swap operation is drawn
operation must be fast enough to finish all the sorting within in the data flow representation as a line shown in Fig. lb.
one sample period of input data. For example, the DAC in
[2] operates at a sampling frequency of 3.072 MHz (sampling x min(X,Y) X min(x,y)
period is 326 ns) and uses 32 current sources at each stage
of cascaded modulator. Although there are many area efficient Y max(X,Y) Y -----1-- max (X,Y)
data sorting algorithms, the timing requirement limits available (a) Block (b) Data Flow
options to fast and parallel ones. For example, quicksort algo
rithm requires n2 steps at the worst case [3]. Sorting 32 data
Fig. 1. Compare and Swap Block

values using quicksort algorithm require 322 = 1024 clock Parallel sorting algorithms are based on the parallel exe
cycles, in other words, the sorting block must operate above cution of compare and swap operations. A relatively simple

978-1-5090-4386-6/17/$31.00 2017 IEEE


2017 6th International Conference on Modern Circuits and Systems Technologies (MOCAST)

algorithm is the odd-even transposItIOn sorter as shown in The number of steps for Batcher odd-even sorter is given
Fig. 2 for 8 element sorting. Input elements are compared as 2
and swapped as odd/even pairs. After N steps the output NS (log2 (N)) + log2 (N)
(2)
will be sorted. Since N/2 comparisons are performed for odd
=

2
elements and N/2 - 1 comparisons for even elements, the where NS is the number of steps. The comparator number is
number of comparators is given as
2
N(N -1) N . (log2 (N)) -log2 (N) + 4
NC NC 1
_

= (1) = (3)
2 2 2
where NC is the number of comparators. For large N, (1) where NC is the number of comparators [5]. The shortest
is approximated as N2/2. Since the number of comparator path of Batcher odd-even sorting algorithm is log2(N) and
increases with the square of the number of inputs, the odd the longest path is equal to number of steps.
Another merge-sort algorithm is the Bitonic sorting algo
even transposition sorter is not suitable for large number of
rithm which is based on merging one ascending sequence and
inputs.
one descending sequence [4], [5]. The data flow of Bitonic
sorter for 8 input is shown in Fig. 4.

Xo Yo
1 1 1
Xl Yl Xo
1 I : 1 1 I : 1 1 1 I :
X2 Y2 Xl
1 1
1 1 I 1 1 1 1 1 1
1 I 1 :1 I 1
X3 Y3 X2
1 I 11
1
1
1 1
1
X4 Y4 X3
1 1
1 1 1 1 I
1
1
1 I 1 1
1
Xs Ys
1 1 I 1 1 1 I 1
X6 Y6 Xs 1 1 1 I 1 1 1 1 1 1
X7 _______________ .J
Y7 X6
1 I 1
1 1 I 1 1 1 1 I 1
X7 I
I __ J I ___ __ .J I ______ L ___ L _ .J
Fig. 2. Odd-Even Transposition Sorter Step I Step 2 Step 3 Step 4 Step 5 Step 6

The main advantage of the odd-even transposition sort Fig. 4. Bitonic Sorter
ing algorithm is that the compare and swap operations are
performed by adjacent elements so the routing complexity The number of steps for Bitonic sorter is given as
2
between blocks is low. By allowing connection among any
NS (lOg2 (N)) + log2 (N)
elements, the number of levels from input to output is reduced.
= (4)
2
Since merging two already sorted data sequences is easy, where NS is the number of steps. Since N/2 compare and
merge-sort algorithms are proposed. The Batcher's odd-even swap operations are used, the total number of comparators is
mergesort algorithm is an efficient implementation of the odd NS . N/2. Although Batcher odd-even sorter requires fewer
even sorter [4], [5]. Inputs are divided into two parts which comparators than the Bitonic sorter, their latencies are identical
are sorted separately. Then the odd and even index entries are because the longest path delay of the Batcher odd-even sorter
sorted. As a last step one more compare and swap operation is equal to the path delay of the Bitonic sorter.
is performed to completely sort the inputs. The data flow of To reduce the area requirement of the Bitonic sorting
the Batcher odd-even sorter for 8 input is shown in Fig. 3. algorithm, a recursive sorting algorithm based on the perfect
shuffle has also been proposed [6]. At every clock cycle N/2
!. .. -l! __ .. ____
t 1 __, __ comparisons are performed as shown in Fig 5. The complexity
I I I I I I of the compare and swap module is higher than the one used
Xo Yo
i I I I I : I I
1 I : in the Bitonic sorter, because depending on the control input
Xl Yl
X2
I I I I I I I I
I I I
I
Y2
value, the minimum of the input goes to the top output or the
I
X3
: I I I I I I I
I 1
I
Y3
bottom output or the compare and swap operation is bypassed.

X4
I
I
I I I I I I I
I I
Y4
The order of control inputs is controlled by a finite state
machine for successful sorting. The number of required clock
Xs
! I I I : I I I I I
Ys cycles is
I
: I I I I
I I 2
X6
I I I
Y6 NCLK (lOg2 (N)) (5)
: i
=

I
I I: I: : I
I I I
I I
I where NCLK is the number of clock cycles. The perfect
L __ ____L _..1 ______ .&. ___ 1__..1
Step l Step 2 Step 3 Step 4 Step 5 Step 6 shuffle requires more clock cycles than the Bitonic step size,
because the data must be shuffled to arrange the sequence of
Fig. 3. Batcher Odd-Even Sorter pivots.
2017 6th International Conference on Modern Circuits and Systems Technologies (MOCAST)

C&S

C&S

C&S

C&S

Fig. 6. Recursive Odd Even Sorter


Fig. 5. Perfect Shuffle Sorter

III. RECURSIVE ODD-EvEN SORTER

The odd-even transposition sorter shown in Fig. 2 performs


N/2 odd and even comparisons. This repetitive property is
used to propose a recursive odd-even transposition sorter. At
every clock cycle, one odd and one even compare and swap
operations are performed. These operations do not require an
extra control input as does the compare and swap operation
used in the perfect shuffle sorter. Therefore, its combinational
circuit complexity is less than perfect shuffle. The number of
clock cycles required to sort N inputs is given as
NCLK = N/2 (6)
where it scales linearly with N unlike the perfect shuffle which
scales logarithmically with N. Therefore, there is a limit N
value which results in fewer clock cycles for the proposed Fig. 7. Recursive Odd Even Sorter with Mux
algorithm compared to the perfect shuffle algorithm. To find
this N value, (5) is set equal to (6) and solved for N. The
result is equal approximately 80. As a result, to sort less Bitonic, Batcher odd-even sorter and odd-even transposition
than 80 inputs, the sorting algorithm based on perfect shuffle sorter based vector quantizers are designed as combinational
requires more clock cycles than the proposed recursive odd circuits. Perfect shuffle and recursive odd-even sorting based
even transposition sorter. vector quantizers are designed as finite-state-machines. Index
The recursive odd-even sorter area is reduced by sharing values of inputs are also tracked during sorting in order to turn
compare and swap blocks as shown in Figure 7. However, on the least used current sources.
this technique doubles the required number of clock cycles To compare the area and speed performance of Bitonic,
for sorting to N. Therefore, to find the N limit which gives Batcher odd-even, odd-even transposition, perfect shuffle and
faster speed than perfect shuffle (5) is set equal to N and recursive odd-even sorting algorithms, vector quantizers are
solved for NThe result is 16. Therefore, to sort less than 16 synthesized using Cadence Genus logic synthesizer with the
inputs, the recursive odd-even sorter with muxes is faster than same synthesis parameters for TSMC 40nm low power tech
the perfect shuffle sorter. nology. Bitonic, Batcher odd-even and odd-even transposi
tion sorting algorithms based vector quantizers require only
IV. COMPARISON
one clock cycle and their speed is limited by the delay of
Vector quantizers based on Bitonic, Batcher odd-even, odd combinational circuits. Although the structures of those three
even transposition, perfect shuffle and recursive odd-even algorithms are suitable for pipelining, the latency for pipelined
sorting algorithms are designed in VHDL for 8, 16, 32, 64, designs will be greater than the latency for non-pipelined
128 and 256 inputs. Outputs of sorters are also registered designs due to the setup-hold time requirements of pipeline
to prevent any glitch, since it is assumed that sorter outputs registers. The perfect shuffle and proposed recursive odd-even
drive unit current sources of the digital-to-analog converter. transposition sorter based algorithms require multiple clock
2017 6th International Conference on Modern Circuits and Systems Technologies (MOCAST)

cycles to determine the output. Moreover, the number of clock shuffle based vector quantizer requires more logic components
cycles shown in Table I increases as the number of inputs than the proposed algorithm due to the complex compare and
increases as indicated in (5) and (6). sort component used in perfect shuffle.

TABLE I TABLE 1lI


NUMBER OF CLOCK CYCLES FOR VECTOR QUANTIZER NUMBER OF SEQUENTIAL COMPONENTS

Number of Inputs I 8 I 16 I 32 I 64 I 128 I 256 Number of Inputs 8 I 16 I 32 I 64 I 128 256

Bitonic 1 1 1 1 1 1 Bitonic 8 16 32 64 128 256


Batcher Odd-Even 1 1 1 1 1 1 Batcher Odd-Even 8 16 32 64 128 256
Odd-Even 1 1 1 1 1 1 Odd-Even 8 16 32 64 128 256
Perfect Shuffle 9 16 25 36 49 64 Perfect Shuffle 108 229 485 1030 2182 4615
Recursive Odd-Even 4 8 16 32 64 128 Recursive Odd-Even 99 212 453 966 2055 4360
Recursive Odd-Even with Mux 8 16 32 64 128 256 Recursive Odd-Even 122 237 480 995 2086 4393
with Mux

The symbol rate of a vector quantizer is defined as its


maximum sorting and selection speed. The speeds of Bitonic,
TABLE IV
Batcher odd-even and odd-even transposition algorithms are NUMBER OF LOGIC COMPONENTS
determined by combinational circuit delays and setup re
quirements of output registers. Symbol rates of the perfect Number 8 16 32 64 128 256
shuffle and the proposed recursive odd-even algorithms are of Inputs

determined by the number of clock cycles which is limited Bitonic 2300 8780 27996 84852 245854 762783
by the combinational delays and setup requirements of flip Batcher 1676 6660 22501 68707 203749 664460
Odd-Even
flops. The symbol rates for vector quantizers based on different
Odd-Even 2702 13010 57807 234441 938339 4107787
sorting algorithms are presented in Table II. Fully parallel
algorithms are faster than recursive algorithms as expected. Perfect 1377 3135 8401 22088 71018 319970
Shuffle
Doubling the number of elements to sort slows the odd-even
Recursive 1060 2936 7220 20180 72161 305664
transposition sorter to half speed. On the other hand, the Odd-Even
speed of the Bitonic and Batcher odd-even sorters are scaled Recursive 863 2123 5528 17761 64308 304767
logarithmically and this property results in higher performance Odd-Even
than the odd-even transposition sorter for a high number of with Mux

elements.
V. CONCLUSION
TABLE 11
SYMBOL RATE COMPARISON OF VECTOR QUANTIZERS (MSAMPLE/s) A recursive odd-even transposition sorter based vector quan

1 1 1 tizer is introduced. The proposed sorter requires fewer clock

I I
Number of 8 16 32 64 128 256 cycles than the perfect shuffle sorter to sort less than 80 inputs.
Inputs
An area efficient version, which requires fewer clock cycles to
Bitonic 362.6 229 153.5 105.4 69.6 46 sort less than 16 inputs, is also proposed. The area requirement
Batcher 374.3 236.1 158.6 107.2 71.5 46.9 is lower than parallel sorters such as the Bitonic, Batcher
Odd-Even
odd-even and odd-even transposition sorters. Therefore, the
Odd-Even 294.4 157.9 82.5 41.7 21 10.6
proposed sorting algorithms are promising solutions to reduce
Perfect 114.3 49.3 21 8.8 3.8 1.6
Shuffle
the digital complexity of mismatch algorithms in high perfor
Recursive 204.2 84.7 29.8 9.5 2.8 0.8
mance digital-to-analog sigma delta converters.
Odd-Even
REFERENCES
Recursive 133.4 53.6 17.3 5.2 1.5 0.4
Odd-Even [1] Schreier, R. and Temes, G. c.: 'Understanding delta-sigma data convert
with Mux ers', IEEE Press Wiley, 2005
[2] Risbo, L., Hezar, R., Kelleci, 8., Kiper, H. and Fares, M., 'Digital
Approaches to lSI-Mitigation in High-Resolution Oversampled Multi
Vector quantizers based on fully parallel sorters require only Level D/A Converters', IEEE Journal of Solid-State Circuits, 2011, 46,
pp. 2892-2903
memory elements to register the output. Recursive algorithms
[3] Hoare, C. A. R., 'Algorithm 64: Quicksort', Communications of the ACM,
also require memory to save element values between steps 1961,4 (7),pp. 321
and to track states. The required sequential elements are [4] Knuth, D. E., 'Art of Computer Programming, Volume 1: Fundamental
Algorithms', Addison-Wesley Educational Publishers Inc, 1968
summarized in Table III. The number of combinational logic
[5] Batcher, K. E., 'Sorting networks and their applications', Proceedings of
components required by each algorithm is summarized in the AFlPS'68, 1968,pp. 307-314
Table IV. Fully parallel algorithms require a higher number [6] Stone, H. S., 'Parallel Processing with the Perfect Shuffle', IEEE Trans
actions on Computers, 1971,C-20,pp. 153-161
of logic components than recursive algorithms. The perfect

You might also like