Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

An Implementation of

Distributed Arithmetic Adaptive Filter


Using Coefficient Distribution
without Look-Up Table
Anirut TRAKULTRITRUNG
Ekkawin THANANGCHUSIN
Sorawat CHIVAPREECHA
Department of Telecommunications Engineering,
Faculty of Engineering ,
King Mongkuts Institute of Technology Ladkrabang,
Thailand
Outline
Introduction
Conventional Distributed Arithmetic
Coefficient-Distributive Distributed Arithmetic
Purposed Structure LMS Adaptive Filter
Synthesis & Experimental Results
Conclusion
2
1.Introduction
Adaptive Filter is widely used in many applications
Noise Cancellation
Reference: Adaptive Noise Cancelling: Principles and Applications
Widrow,B., Glover, J.R., Jr., McCool, J.M., Kaunitz, J., Williams, C.S.,
Hearn, R.H., Zeidler, J.R., Eugene Dong, Jr. and Goodlin, R.C.
3
1.Introduction
Adaptive Filter is widely used in many applications
Echo Cancellation
Reference: An adaptive acoustic echo cancellation without double-talk detection
Xiao Hu, Ai-Qun Hu, Youg Chen and Xiao-Hong Zeng
4
1.Introduction
Adaptive Filter is widely used in many applications
Channel Equalization
Reference: Adaptive Channel Equalization based on RLS Algorithm
Linghui Wang, Wei He', Kaihong Zhou and Zhen Huang
5
1.Introduction
The most of DSP algorithm require multiplications.
5 4 3 2 1 0
5 4 3
A A A A A A Multiplicand
B B B
2 1 0
5 0 4 0 3 0 2 0 1 0 0 0
5 1 4 1 3 1 2 1 1 1 0 1
B B B Muliplier
A B A B A B A B A B A B
A B A B A B A B A B A B
5 2 4 2 3 2 2 2 1 2 0 2
5 3 4 3 3 3 2 3 1 3 0 3
5 4 4 4 3
A B A B A B A B A B A B Partial Products
A B A B A B A B A B A B
A B A B A B
4 2 4 1 4 0 4
5 5 4 5 3 5 2 5 1 5 0 5
11 10 9 8 7 6 5 4 3 2 1 0
A B A B A B
A B A B A B A B A B A B
p p p p p p p p p p p p

Product = Sum of
Partial Product
6
1.Introduction
The most of DSP algorithm require multiplications.
7
1.Introduction
Conventional Distributed Arithmetic (DA)
Input
LUT
1 2 n n
x x
0 0
0 1
1 0
1 1
0
2
a
1 2
a a +
1
a
1n
x
2n
x
1
2

10 11 12 13
. x x x x
20 21 22 23
. x x x x
1 1 2 2
y a x a x = +
8
Scaling
accumulator
2.Conventional Distributed Arithmetic
FIR Filter
0 1 2 1
( ) ( ) ( 1) ( 2) ... ( ( 1))
N
y n h x n h x n h x n h x n N

= + + + +
: Number of Filter Coefficients
: Filter Coefficients
k
N
h
1
0
( ) ( )
N
k
k
y n h x n k

=
=

1
Z

0
h
1
Z

1
h
1
Z

2
h
+
3
h
+ +
1
Z

M
h
+ ( ) y n
( ) x n
0
h
1
h
2
h
3
h
1 N
h

9
The input samples will be represented by B-bit
in 2s complement format
The output y(n) is
where and
1
0
1
( ) 2
B
j
k kj
j
x n k x x

=
= +

1 1 1
0
0 1 0
( ) 2
N B N
j
k k k kj
k j k
y n h x h x

= = =

= +



1
0
( ) 2
B
j
j
j
y n C

=
=

1
0
, 1,..., 1
N
j k kj
k
C h x j B

=
= =

1
0 0
0
N
k k
k
C h x

=
=

10
1
2

PISO
SISO
SISO
SISO
0 j
x
( ) x n
1 j
x
2 j
x
3 j
x
j
C
( ) y n
Input
3 2 1 0 j j j j
x x x x
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
0
0
h
1
h
1 0
h h +
2
h
2 0
h h +
2 1
h h +
2 1 0
h h h + +
1 0 0 0
1 0 1 0
1 1 0 0
1 1 1 0
1 1 1 1
1 1 0 1
1 0 1 1
1 0 0 1
3
h
3 0
h h +
3 1
h h +
3 1 0
h h h + +
3 2
h h +
3 2 0
h h h + +
3 2 1
h h h + +
3 2 1 0
h h h h + + +
j
C
1
0
( ) 2
B
j
j
j
y n C

=
=

1
0
, 1,..., 1
N
j k kj
k
C h x j B

=
= =

1
0 0
0
N
k k
k
C h x

=
=

Hardware Architecture (N=4)


11
3.Coefficient-Distributive DA
The idea of Distributed Arithmetic is swapped
The filter coefficients are distributed into bit-level in B-bit
2s complement format
The output y(n) is
For reduce the complexity of control signal we replace
the LUT with the Adder Network and Multiplexer
1
0
1
2
B
j
k k kj
j
h h h

=
= +

1 1 1
0
0 1 0
( ) ( ) ( ) 2
N B N
j
k kj
k j k
y n x n k h x n k h

= = =

= +



1
0
( ) 2
B
j
j
j
y n D

=
=

1
0
( ) , 1, ..., 1
N
j kj
k
D x n k h j B

=
= =

1
0 0
0
( )
N
k
k
D x n k h

=
=

where and
12
Hardware Architecture (N=4)
1
0
( ) 2
B
j
j
j
y n D

=
=

1
0
( ) , 1, ..., 1
N
j kj
k
D x n k h j B

=
= =

1
0 0
0
( )
N
k
k
D x n k h

=
=

13
Partial product
Adder Network and MUX
(N=4)
14
Partial product
4.FIR Adaptive Filter
x(n) : Input signal
d(n) : Desired signal
y(n) : Output signal
e(n) : Error signal
Error Signal: ( ) ( ) ( ) e n d n y n =
LMS algorithm:
( 1) ( ) ( ) ( ); : convergence factor
k k
h n h n e n x n k + = +
1
0
Output Signal: ( ) ( ) ( )
N
k
k
y n h n x n k

=
=

For implement multiplierless adaptive filter we use hardware scaling


and barrel shifter to approximate the values of ( ) ( ) e n x n k
( ) e n
( ) y n

( ) d n
( ) x n
15
4.The Purposed Structure LMS Adaptive Filter
y(n)
ADD/SUB
lacc
clacc
Shift 0
s/ a
clacc
lr
2
-1

ACC
Buffer
Shifter
Shift 1

BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
upd
3
h
2
h
1
h
0
h
lr
clk
lr
Shift 2
lr
d(n)
-
+
Buffer
P
I
S
O
Buffer
P
I
S
O
P
I
S
O
P
I
S
O
MUX
( )
Quantizer
e n
Parallel Data
Serial Data
Adder
N/W
PIPO
PIPO
PIPO
x(n)
x(n-1)
x(n-3)
x(n-2)
16
4.The Purposed Structure LMS Adaptive Filter
y(n)
ADD/SUB
lacc
clacc
Shift 0
s/ a
clacc
lr
2
-1

ACC
Buffer
Shifter
Shift 1

BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
upd
3
h
2
h
1
h
0
h
lr
clk
lr
Shift 2
lr
d(n)
-
+
Buffer
P
I
S
O
Buffer
P
I
S
O
P
I
S
O
P
I
S
O
MUX
( )
Quantizer
e n
Parallel Data
Serial Data
Adder
N/W
PIPO
PIPO
PIPO
x(n)
x(n-1)
x(n-3)
x(n-2)
17
4.The Purposed Structure LMS Adaptive Filter
y(n)
ADD/SUB
lacc
clacc
Shift 0
s/ a
clacc
lr
2
-1

ACC
Buffer
Shifter
Shift 1

BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
upd
3
h
2
h
1
h
0
h
lr
clk
lr
Shift 2
lr
d(n)
-
+
Buffer
P
I
S
O
Buffer
P
I
S
O
P
I
S
O
P
I
S
O
MUX
( )
Quantizer
e n
Parallel Data
Serial Data
Adder
N/W
PIPO
PIPO
PIPO
x(n)
x(n-1)
x(n-3)
x(n-2)
18
4.The Purposed Structure LMS Adaptive Filter
y(n)
ADD/SUB
lacc
clacc
Shift 0
s/ a
clacc
lr
2
-1

ACC
Buffer
Shifter
Shift 1

BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
upd
3
h
2
h
1
h
0
h
lr
clk
lr
Shift 2
lr
d(n)
-
+
Buffer
P
I
S
O
Buffer
P
I
S
O
P
I
S
O
P
I
S
O
MUX
( )
Quantizer
e n
Parallel Data
Serial Data
Adder
N/W
PIPO
PIPO
PIPO
x(n)
x(n-1)
x(n-3)
x(n-2)
19
20
Control Unit Simulation Result
5.Synthesis Results
Hardware Simulation Result
5.Synthesis Results
21
When Apply the constant value of input and desired signal
the output of adaptive filter is produced and adaptation is perform
until the error signal is zero, the output is equal to the desired.
5.Synthesis Results
Conventional DA Coefficient-Distributive DA
Registered Performance
(Maximum Clock Frequency)
5.47 MHz 25.06 MHz
Memory Bits 320 bits 0 bits
Logic Cell Utilized 506 LCs 1,521 LCs
22
Performance Comparison
5.Synthesis Results
The throughput is defined as the ratio of clock rate to the number of
clock cycles required for processing a signal sample.
For N-tap FIR Adaptive Filter, both LUT-based and CD-DA adaptive
filter requires B clock cycles for compute the output sequence.
The LUT-based structure requires N clock cycles for updating LUT.
The CD-DA structure requires constant 4 clock cycles
(number of tap independent ) for update filter coefficients directly,
thus
Clock Rate
Throughtput
t
=
LUT-Based
Clock Rate
Throughtput
B N
=
+
CD-DA
Clock Rate
Throughtput
4 B
=
+
23
Throughput Comparison
(Clock Rate = 2 MHz, B=8 bits)
4 8 16 32
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
x 10
5
N
u
m
b
e
r

o
f

S
a
m
p
l
e
s

p
r
o
c
e
s
s
e
d

p
e
r

s
e
c
o
n
d
Number of Filter Coefficients
Throughput Comparison of LUT-less versus LUT-based Structure


LUT-based
LUT-less
5.Synthesis Results
24
5.Experimental Results
( ) e n
( ) y n

( ) d n
( ) x n
Sinusoidal High Freq.
+
Sinusoidal Low Freq.
Desired
Sinusoidal Low Freq.
High Frequency : 7 kHz
Low Frequency: 500 Hz
Sampling Frequency: 30 kHz
Convergence Factor: 0.0625
4-tap FIR Adaptive Filter
25
5.Experimental Results
Desired
Input
Output
Simulation Result
Experimental Result
26

50Hz Sine
Reference
Corrupted ECG
( ) e n
( ) y n
27
5.Experimental Results
Sampling Frequency: 30 kHz
Convergence Factor: 0.0625
4-tap FIR Adaptive Filter
28
5.Experimental Results
Corrupted
ECG
Filted ECG
Simulation Result
Experimental Result
Conclusion
The multiplierless adaptive filter can be implemented.
The LUT is replaced by adder network and multiplexer.
Very simple structure
Constant Throughput
(do not require extra time for LUT updating) .
The complexity of adder network will be increased
exponentially when the filter length is higher.
29
Thank you

You might also like