A 1.4 MW Low-Power Feed-Back Fxlms Anc Vlsi Design For In-Ear Headphones

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

A 1.

4 mW Low-Power Feed-Back FxLMS ANC VLSI Design


for In-ear Headphones
Hong-Son Vu1 and Kuan-Hung Chen2
1
Ph.D. Program of Electrical and Communications Engineering, Feng Chia University, Taichung 40724, Taiwan, R.O.C
2
Department of Electronic Engineering, Feng Chia University, Taichung 40724, Taiwan, R.O.C

Abstract—This work proposes a dedicated VLSI hardware Primary Path P(z)


d(n) Error Mic.
architecture design that considers both audio signal processing Noise
Source
requirements and hardware costs to achieve critical real-time
active noise cancellation (ANC) performance for in-ear Acoustic Region y’(n) Loudspeaker
headphones. This new approach allows ANC systems to be Power
Preamplifier
operated at lower operating frequency which results in lower Amplifier

power consumption, and facilitates higher performance than Reconstruction Antialiasing


Analog Region Filter Filter
that achieved by conventional ANC headphones. Compared with
the existing works, the proposed design outperforms in DAC ADC
attenuating wider bandwidth of noises, and saves 99.41% power Secondary Path S(z)

consumption. Digital Region


Copy e(n)
x(n) y(n)
I. INTRODUCTION W(z) S’(z)
y1(n)
Different from a feedforward FxLMS ANC system that
S’(z) S’(z)
can physically acquire a reference signal of the noise [1]–[4], LMS -1

the feedback FxLMS ANC system needs to generate a virtual x’(n) y’(n) v(n) e1(n)
White noise generator
reference signal based on the internal model control structure, LMS
i.e. using the estimated anti-noise signal y’(n) and the error
d’(n)
signal e(n) to synthesize the reference signal x(n). This results
in additional computation burden that caused by the Fig. 1. Proposed feedback FxLMS algorithm and its application on an ANC
calculation of the estimated noise x’(n) and that of the headphone.
estimated anti-noise signal y’(n).
In this design, we adopt parallel implementation in
II. DEDICATED ARCHITECTURE DESIGN
calculating the estimated noise x’(n) and the estimated anti-
noise signal y’(n) for obtaining both signals at the same time. Fig. 1 presents a headphone with the proposed feedback
This gives the proposed system two advantages. First, 28 ANC system. In this system, y(n) is the anti-noise signal and
cycles for calculating the estimated anti-noise signal y’(n) are e(n) is the residual error signal, which is the superposition of
removed. Therefore, the total computation burden is reduced the primary noise d(n) and the played-back anti-noise y(n). In
from 161 into 133 cycles, i.e. a 17.4% computational burden addition, P(z) and S(z) respectively denote the primary path
reduction. Second, these two FIR filters only need one and the secondary path. Meanwhile, W(z) indicates the filter
common control logic unit since they share the same weights of the ANC controller implemented based on least
coefficients of the estimated secondary path S’(z), and access mean square (LMS) adaptive filter algorithm. The primary
their corresponding data and coefficients at the same time by noise d(n) is attenuated by the anti-noise y’(n) which is
using proposed interleaving memory structure. The data-path unavailable during the noise cancellation processes. Hence,
consists of only a multiply-accumulate (MAC) unit and the primary noise must be estimated through an accurate
several registers so that the data-path is very compact and low estimated secondary path model S’(z). Moreover, an on-line
cost. With the above design elaboration, the proposed design modeling in [4] is adopted to identify the dynamic equations
achieves at least 2.56 times faster processing speed than for the secondary path to avoid unstable situations that come
previous works so that lowering down the operating with small shifts of in-ear headphone position or small
frequency and the resulting power dissipation becomes mismatch across different users.
possible. Consequently, the proposed work achieves 99.41% Let ࢙Ԣ௠ ሺ݊ሻ be the coefficient vector of the estimated
power consumption saving and outperforms in attenuating secondary-path S’(z) and ࢃሺ݊ሻ be the coefficient vector of
wider bandwidth of noises when compared with the existing the adaptive filter W(z) at time instant n, the estimate of the
designs. primary noise x(n) becomes
The proposed ANC circuit design has been successfully ‫ݔ‬ሺ݊ሻ ‫ ݀ ؠ‬ᇱ ሺ݊ሻ ൌ ݁ଵ ሺ݊ሻ ൅ ‫ ݕ‬ᇱ ሺ݊ሻ
implemented by using Taiwan Semiconductor Manufacturing ൌ ݁ଵ ሺ݊ሻ ൅ σெିଵ ᇱ
௠ୀ଴ ‫ ݏ‬௠ ሺ݊ሻ‫ݕ‬ሺ݊ െ ݉ሻǡ
Company (TSMC) 90nm CMOS technology. In addition, the ݉ = 0, 1, 2,…, ‫ܯ‬-1. (1)
proposed design has been verified under versatile noise where the output of the adaptive filter at time instant n, i.e.
scenarios for its real-time ANC performance by using an ‫ݕ‬ሺ݊ሻ, is given by
FPGA platform, i.e. XILINX ZEDBOARD. Experimental
‫ݕ‬ሺ݊ሻ ൌ  σ௅ିଵ
௟ୀ଴ ‫ݓ‬௟ ሺ݊ሻ‫ݔ‬ሺ݊ െ ݈ሻ,
results show that the proposed design can attenuate broadband
݈ = 0, 1, 2,…, ‫ܮ‬-1. (2)
pink noise between 100–600 Hz, with a maximum
performance of 15 dB. Moreover, an estimated primary noise is calculated as
‫ ݔ‬ᇱ ሺ݊ሻ ൌ σெିଵ ᇱ
௠ୀ଴ ‫ ݏ‬௠ ሺ݊ሻ‫ݔ‬ሺ݊ െ ݉ሻǡ

978-1-4673-9498-7/16/$31.00 ©2016 IEEE


Memory FPGA virtual noise signal. In this paper, we adopt 24-tap filters (M =
RS232

Zed_audio_ctrl IS 2
I2S
Headphone

Line Out
L = 24) for computing both W(z) and S’(z) models. Step sizes
UART DDR3 AXI
Interconnect
Audio
Codec
Line In of 0.01 and 0.004 were used in the adaptive filter W(z) and

AXI4-Lite
Memory GPIO I 2C Microphone

Processing Controller Block on-line secondary path modeling S’(z), respectively.


System GPIO DIP Switches
We optimize the proposed design with strategies
ARM
M_AXI_GP0
AXI
Proposed
specialized for both audio signal processing requirements and
Cortex-A9 Interconnect Design
Block hardware costs to achieve critical real-time ANC performance
and low-power purpose. The detailed hardware architecture of
I/O Interface
the FIR filter for calculating both x’(n) and y’(n) is shown in
Proposed Design
Data Bus Data In/Out
Control Fig. 3(a), where the data and coefficients are stored in each
Logic
x(n), x’(n),

corresponding buffer. A three-stage pipelining MAC


d’(n)x(n)

e(n), v(n)
e1(n)
x’(n)

e1(n)

e1(n)
y(n)

Control Bus/ Address Bus


architecture is adopted to increase the data throughput rate of
the FIR filters. In each cycle, the data and coefficients are
S’(z)_FIR_24
d’(n)
Adder
y’(n)
S’(z)_FIR_24
y(n)
LMS_24 LMS_24 read out from the interleaving buffers to the registers. Then,
_tap_x’(n) _tap_y’(n) _tap_W(z) _tap_S’(z)
multiplication of the two operands of the convolution is
performed and one adder accumulates the partial products to
Fig. 2. Top-level block diagram of the proposed ANC system architecture. obtain the filter output. Two counters are used to handle the
addresses for each buffer. A comparator (Coeff addr0 = = M-
݉ = 0, 1, 2,…, ‫ܯ‬-1. (3) 2) is used to drive the enable signal that can disable the
The resulting FxLMS algorithm can be represented by (4). counter for one cycle every output sample, and writes new
‫ݓ‬௟ ሺ݊ ൅ ͳሻ ൌ ‫ݓ‬௟ ሺ݊ሻ ൅  Ɋଵ ‫ݔ‬Ԣሺ݊ െ ݈ሻ݁ଵ ሺ݊ሻ, samples into the data buffers every M cycles. These new
݈ = 0, 1, 2,…, ‫ܮ‬-1. (4) samples are also sent the outputs Dout x[n] and Dout y[n]
The identification of secondary path can be described as though using two 16-bit 2-to-1 multiplexers. Fig. 3(b) shows
follows: the detailed hardware architecture of the proposed 24-tap
݁ଵ ሺ݊ሻ ൌ ݁ሺ݊ሻ ൅ ‫ݕ‬ଵ ሺ݊ሻ ൌ ݁ሺ݊ሻ ൅ σெିଵ ᇱ
௠ୀ଴ ‫ ݏ‬௠ ሺ݊ሻ‫ݒ‬ሺ݊ െ ݉ሻǡ LMS algorithm, which produces the electronic anti-noise
݉ = 0, 1, 2,…, ‫ܯ‬-1. (5) signal y(n) and continually adjust the coefficients of the
‫ݏ‬Ԣ௠ ሺ݊ ൅ ͳሻ ൌ ‫ݏ‬Ԣ௠ ሺ݊ሻ ൅  Ɋଶ ‫ݒ‬ሺ݊ െ ݉ሻ݁ଵ ሺ݊ሻ, adaptive filter based on the measured error signal and
݉ = 0, 1, 2,…, ‫ܯ‬-1. (6) reconstructed noise stored in data buffers. This hardware
where v(n) indicates the white noise signal that is required to architecture contains the MAC FIR filter, the weight update
identify the secondary path on-line. controller W[n], and three data buffers that are used to store
For a complete system design, the FPGA prototype is the synthesized data x[n], the adjusted coefficients, and the
developed on a platform named XILINX ZEDBOARD. The output data x’[n]. In addition, to keep tracking on the changes
top-level block diagram of the proposed ANC system of S(z) due to small position variations of the in-ear
architecture is illustrated in Fig. 2, where ZEDBOARD headphone, we adopt the on-line secondary path estimation
platform is used to synthesize and implement our design. module as described in [4].
Because this design is targeted specifically at the digital Based on the hardware architecture described above, we
design of the ANC signal processing, we adopt existing A/D summarize the operation cycles of the baseline design and the
and D/A converters equipped on I2S Audio CODEC of the proposed design in Fig. 4. To optimize hardware cost, power
platform to carry out the mixed signal processing. This consumption, and data throughput rate, we use a three-stage
platform includes an ARM cortex-A9 processor, an I2S Audio pipelining MAC architecture, interleaving memory
CODEC, and our proposed design. The ARM processor is organization, and parallel implementation in computing the
only responsible for allocating tasks to each module of the estimated noise x’(n) and the estimated anti-noise signal y’(n).
system through using an AXI4-Lite interface. The proposed As shown in Fig. 4, the baseline design respectively needs 48
design handles all the processing details of the ANC cycles for calculating each convolution and 72 cycles for
controller based on the acquired error signal to produce the updating the coefficients of the adaptive filter. As a result, the
electronic anti-noise signal y(n). Finally, this signal can be baseline design needs up to 336 cycles to finish all
played through the loudspeaker inside the ear-cup for calculations for each sampling data, while the proposed
producing the acoustic anti-noise signal y’(n) to attenuate the design needs only 133 cycles to complete all calculations.
primary acoustic noise d(n) in the feedback ANC system. As Therefore, for the case of 96 kHz sampling rate, the optimal
shown in Fig. 2, the proposed design contains the following operating frequency can be reduced from 50 MHz to 20 MHz,
submodules: 1) two FIR filters, i.e. S’(z)_FIR_24_tap_x’(n) and a 60% power saving is subsequently reached.
and S’(z)_FIR_24_tap_y’(n), to produce the signals x’(n) and
y’(n), respectively; 2) two 24-tap adaptive filters, i.e. III. IMPLEMENTATION RESULTS
LMS_24_tap_W(z) to generate the electronic anti-noise signal The proposed design has been successfully implemented
y(n), and LMS_24_tap_S’(z) for estimating the secondary path through using a standard cell-based design flow and
on-line; and 3) an adder for summing the residual noise signal fabricated based on the TSMC 90nm CMOS technology. The
e(n) and the anti-noise signal y’(n) for reconstructing the total chip size including the I/O pads is 0.923×0.923 mm2.
Data Buffer x[n]
x[n]
0

Dout x[n] Multiply-Accumulate

16bit 2x1
16 Data Samples Output data

MUX
24 x 16 x’[n]
x[n]

Truncation
1
+
16
Data Buffer y[n] R/W1 Dout w[n]
y[n]

Control Logic 16 Data Samples 0


24 x 16
Counter

16bit 2x1
Dout y[n] Multiply-Accumulate Output data

MUX
0~(M-1) y’[n]
R/W1
y[n]

Truncation
Data addr1
en0 addr Coefficients Buffer S’[n]
1
+
16
R/W1 Dout w[n]
Coefficients
24 x 16

Coeff addr0 == M-2


W0[n]-W23[n]
5 Z -4
Counter 5 24x16 W0[n+1]-W23[n+1] 24x16
0~(M-1) Data
R/W0 addr1

R/W1 Coeff R/W0 Adaptive Coeffs.


addr addr0 ctrl_bit0
Coeff addr0
Control Logic clk Weight Update
ctrl_bit0 Controller S’[n]

(a)
Data Buffer Synthesized x[n]
Synthesized signal x[n]
Multiply-Accumulate
16 0
Data Samples
x[n]
16bit 2x1

24 x 16
MUX

S’(z) Synthesized Output data y[n]


signal x[n] Weight Update Controller W[n] x’0[n]-x’23 [n]
Truncation

16 Data Buffer x’[n]


Coefficients Buffer w[n]
1
+ Counter 0~(L-1)
23x16 0x16 24x16
x’[n] 16
R/W3
clk Address_u0 16bit 24x1
Data Samples clk addr
Coefficients w[n] MUX
24 x 16 5
24 x 16
16
16
e1[n]
W 0[n]-W 23[n] ctrl_bit2 0x16

16bit 24x1
0
5 Z -4 1

MUX
24x16 5
W 0[n+1]-W 23[n+1] 32
Data Data 24x16 y[n] W 0[n]-W23[n] 22 16
addr4 addr3 5
24x16
y’[n] Truncation μ1
S’(z) 24x16
23
23x16
R/W4 R/W3 Coeff R/W2 16
ctrl_bit2 Adaptive Coeffs. 0x16
addr2 Input data e1[n]

+
0

16bit 1x24
1

DMUX
Weight Update
Control Logic clk Controller W[n] W 0[n+1]-W 23[n+1] 22
Acoustic region

16
23
24x16 23x16
x’0[n], x’1[n],…, x’23 [n]
d[n]

(b)
Fig. 3. (a) The pipelined simple MAC-based FIR filter structure for calculating both x’(n) and y’(n). (b) The proposed 24-tap LMS filter architecture.

336 cycles

Calculating the output y1(n) of the secondary path estimation S’(z)


Adjusting the coefficients of the secondary path estimation S’(z)

Non Calculating the estimated Calculating the estimated Calculating the control Adjusting the coefficients of the Updating the coefficients of the secondary
-pipelined anti-noise signal y’(n) noise signal x’(n) signal y(n) adaptive filter path estimation S’(z)

0 48 96 144 216 264 336

With Calculating Calculating the output y1(n) of the secondary path estimation S’(z)
pipelining, the estimated
parallel anti-noise Adjusting the coefficients of the secondary path estimation S’(z)
processing, signal y’(n)
and Calculating Adjusting the
interleaving Calculating Updating the coefficients
the estimated coefficients of
memory the control of the secondary path
noise signal the adaptive
organization signal y(n) estimation S’(z)
x’(n) filter

0 28 56 80 108 133 Cycles

Fig. 4. Performance increment using pipelining, parallel processing, and interleaving memory organization and scheduling.

The total equivalent gate count is 111.60 k, total on-chip MHz. To evaluate the hardware efficiency of the proposed
RAM memory is only 432 bytes, and the power consumption design, we adopt the performance indexes including data
is 1.40 mW when operated at 20 MHz and 0.7 V for processing speed, power consumption, and noise reduction
achieving the real-time ANC performance requirement. performance. The design [5] showed that the FxLMS
Besides, the maximum measured operating frequency is 87 algorithm executed on TMS320 costs 341 cycles, while the
90
Background
85 Pasive
Active
80

Magnitude (dB re20 uPa)


75

70

65

60

55

50

45
100 1000
Frequency (Hz)

Fig. 6. Spectrum of the residual noise signal for cancelling broadband pink
Fig. 5. Experimental setup for testing the proposed ANC in-ear headphone. noise, solid line: background noise, dotted line: ANC OFF, dash-dot line: ANC
ON.
proposed design only needs 133 cycles, i.e. the proposed
design has a speed up of 341/133 ؆ 2.56 times in processing IV. CONCLUSION
speed. The authors of [6] proposed utilizing a low-cost
microcontroller instead of the DSP. However, [6] needs even We have successfully implemented power-efficient
longer processing time, i.e. 355 cycles, thus its processing feedback FxLMS ANC circuit based on the TSMC 90nm
speed is much slower than that of the proposed design up to CMOS technology, which considers both audio signal
355/133 ؆ 2.67 times. Experimental setup for testing the processing requirements and hardware costs to achieve
proposed ANC in-ear headphone is illustrated in Fig. 5. A critical real-time ANC performance for ANC in-ear
broadband pink noise with bandwidth of 100–800 Hz is used. headphones. Compared with the existing designs, the
Fig. 6 shows the measuring result of proposed design, i.e. 15 proposed work achieves at least 2.56 times faster processing
dB noise reduction and up to 600 Hz attenuation bandwidth speed, outperforms in attenuating bandwidth of noises, and
compared to that of 15–20 dB at 350–400 Hz and 20 dB at saves 99.41% power consumption.
250–300 Hz in [6]; and that of around 10 dB at 200–350 Hz
REFERENCES
in [7]. In summary, the proposed work outperforms the works
[1] H. S. Vu, K. H. Chen, S. F. Sun, T. M. Fong, C. W. Hsu, and L. Wang,
[6] and [7] in attenuating broadband noise with bandwidth of “A power-efficient circuit design of feed-forward FxLMS active noise
100–600 Hz. cancellation for in-ear headphones,” in Proc. Int. Symp. VLSI Design,
Automation and Test (VLSI-DAT), 2015, pp. 1–4.
TABLE I
[2] H. S. Vu, K. H. Chen, S. F. Sun, T. M. Fong, C. W. Hsu, and L. Wang,
CHARACTERISTICS COMPARISON OF THE PROPOSED DESIGN,
“A 6.42 mW low-power feed-forward FxLMS ANC VLSI design for
LOW-COST MICROCONTROLLER, AND DSP
in-ear headphones,” in Proc. IEEE Int. Symp.Circuits and Syst.
Item Proposed design PIC24H TMS320C6747 (ISCAS), 2015, pp. 2585–2588.
[3] H. S. Vu, K. H. Chen, and T. M. Fong, “Active noise control for in-ear
Frequency (MHz) 20 40 300 headphones: Implementation and evaluation,” in Proc. IEEE Int. Conf.
Dedicated ANC Basic Complex math on Consumer Electronics-Taiwan (ICCE-TW), 2015, pp. 264–265.
Math Function
operations operations (floating-point) [4] H. S. Vu and K. H. Chen, “A low-power broad-bandwidth noise
32K+external cancellation VLSI circuit design for in-ear headphones,” IEEE Trans.
Memory (Byte) 432 16K Very Large Scale Integr. (VLSI) Syst. [Online]. Available:
memory
http://ieeexplore.ieee.org
Core supply [5] S. M. Kuo, I. Panahi, K. M. Chung, T. Horner, M. Nadeski, and J.
0.7 3.3 1.2
voltage (volt) Chyan, “Design of active noise control systems with the TMS320
Core Power family,” Texas Instruments, Stafford, TX, USA, Tech. Rep. SPRA042,
1.40 237.60 [8] 305.40 [9] Jun. 1996.
consumption (mW)
[6] C. Y. Chang and S. T. Li, “Active noise control in headsets by using a
Moreover, the characteristics of the proposed design, a low-cost microcontroller,” IEEE Trans. Ind. Electron., vol. 58, no. 5,
pp. 1936–1942, Jul. 2011.
PIC24H microcontroller, a floating-point DSP shown in Table
[7] K. K. Shyu, C. Y. Ho, and C. Y. Chang, “A study on using
I indicate that the proposed design only consumes a low- microcontroller to design active noise control systems,” in Proc. IEEE
power dissipation of 1.40 mW, i.e. the proposed work Asia Pacific Conf. on Circuits and Systs. (APCCAS), 2014, pp. 599–
602.
achieves (237.6 – 1.40)*100/237.6 ؆ 99.41% power
[8] PIC24H Family Data Sheet. High-Performance, 16-bit
consumption saving. Although the cost of implementing the Microcontrollers. [Online]. Available: http://www.microchip.com/dow-
proposed design is more expensive, at USD$ 82.37/chip with nloads/en/DeviceDoc/70175d.pdf, accessed Jan, 2015.
manufacturing eight-piece prototype package against USD$ [9] The power consumption of the Texas Instruments C6748/46/42.
5.08/PIC24H and USD$ 15.84/DSP with 5K purchasing [Online]. Available: http://processors.wiki.ti.com/index.php/C6747/45/
43_Power_Consumption_Summary, accessed Jan, 2015.
volume. However, its advances in the noise reduction
performance and power consumption are remarkable.
Moreover, the cost of implementing an ASIC chip may be
greatly reduced when mass-produced.

You might also like