Professional Documents
Culture Documents
Yang 等 - 2019 - Design of an Always-On Deep Neural Network-Based 1
Yang 等 - 2019 - Design of an Always-On Deep Neural Network-Based 1
6, JUNE 2019
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: DESIGN OF AN ALWAYS-ON DEEP NEURAL NETWORK-BASED 1-μW VAD 1765
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.
1766 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 54, NO. 6, JUNE 2019
Fig. 3. VAD system architecture using analog acoustic feature extraction and digital classification, with event-driven analog-to-digital conversion.
The time interval between two adjacent events is the function scenarios is mixed with the clean speech at different SNR
of the integrated v oBPF_k . levels using the noise mixing software program downloaded
In audio signal processing, a frame length of 25 ms is from aurora.hsnr.de.
typically used, with a 10-ms frame shift. Abiding by this The acoustic features for off-line training of the BNN are
convention, one input neuron in BNN has the value of the generated by the customized front-end python model. The
event number (EN) accumulated in a 25-ms window, which LNA is modeled by a high-pass and a low-pass transfer
can be seen as the quantized area under the waveform of function. An extra low-pass transfer function is added to model
the rectified v oBPF_k within 25 ms. Incorporating multiple the LNAs larger than 20 dB/decade roll-off. The BPFs are
contextual neighboring frames can improve the classification modeled by 2nd-order bandpass transfer functions with geo-
accuracy [25]. As a tradeoff between the accuracy and power, metrically scaled central frequencies. The FWRs are modeled
only two instead of more neighboring frames are used, i.e., to by the transfer functions of the output current in relation to
classify frame n, frame (n −3), n, and (n +3) are used together the input voltage. The IAF models the integration of input
as the input, and hence, the number of input neurons is 3× current on a capacitor, and whenever the capacitor potential
the front-end channel number. Because this scheme utilizes exceeds a predefined threshold, the capacitor voltage is reset
frames in the future, it results in latency, which is 30 ms in to 0 and an event is generated. We fit the models of all the
the case here. Latency is an important metric in applications building blocks with parameters from Spectre simulations with
like VAD-assisted hearing aids [28], which requires careful non-idealities considered, such as the frequency dependency
tradeoff with accuracy and power. of the FWR transfer functions, the finite bandwidth of the
Assuming a passive microphone that gives a 30-μVrms comparator in IAF, etc. To demonstrate the efficacy of our
output signal at a 65-dB sound pressure level (SPL) for a AFE model, we show in Fig. 4(a) the comparison of the
normal loudness conversation at 1-m distance [24], and a extracted features, i.e., EN along the frame sequence, using
minimum 10-dB SNR at the input of the VAD system, an IRN both the python model and the transient Spectre simulation
of 10 μVrms needs to be achieved. A maximum LNA gain with an utterance from Aurora4 shown in Fig. 4(c) as the input.
of 42 dB and a maximum BPF gain of 18 dB ensure a more The two sets of features almost overlap in all 16 channels.
than 50 dB dynamic range (DR) of the FWR output current Fig. 4(b) shows the small differences between the two sets of
with a 10-pA leakage current. The event number within a features.
25-ms frame is designed to be nominally less than 255 in The off-line training of the BNN classifier uses the features
response to speech, i.e., 8 bit, even though the theoretical max generated by the python model. The BNN has three hidden
is about 900 given the 5-nA max value of f v→i , 0.9-pF Cint , layers, and each has 60, 24, and 11 neurons, respectively. The
and 0.15-V Vrefdn in (1). number of hidden layers and neurons is heuristically selected
The audio dataset we use is built on the Aurora4 corpus [29]. for best possible hit-rate performance within the power budget.
Three hundred clean utterances with a 16-kHz sampling rate The activations of the two output neurons are compared with-
and a 16-bit resolution are randomly selected from Aurora4 as out binarization to classify a frame as either a voice or a noise
the basis for the training dataset, and another 30 and 300 clean frame. The optimization algorithm is the modified stochastic
utterances for the development and test dataset [25]. The gradient descent for BNN [26]. For training, the feature frames
utterances in each dataset are concatenated. The total length of are randomly shuffled, but the contextual window including
the training and development audio is about 37 and 3 minutes, frames n − 3, n, and n + 3 is maintained. The important
respectively. An additional non-speech section is added to training parameters are: batch size 200, learning rate 0.0003,
the test audio to balance the speech and non-speech periods, number of epochs 2000, dropout rate of hidden neurons 0.2.
so the test audio has a total length of about 1 hour. The The BNN models are separately trained for each noisy speech
noise corpus DEMAND contains 18 different noise scenarios with a specific noise type and SNR value. This is called noise-
including metro and restaurant [30]. Noise audio of different dependent training [25], in contrast to the noise-independent
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: DESIGN OF AN ALWAYS-ON DEEP NEURAL NETWORK-BASED 1-μW VAD 1767
Fig. 4. (a) Comparison of the 16-channel acoustic features in event number (EN) over a frame sequence generated by the customized python model (red
hollow circle) and by the post-layout transient Spectre simulation (black line). (b) EN difference (EN). (c) Audio waveform of the utterance that is selected
from the Aurora4 corpus, and used to generate the features (maximum amplitude normalized to 1).
Fig. 5. LNA block diagram. Fig. 6. Removal of one tail current of the inverter-based input stage.
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.
1768 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 54, NO. 6, JUNE 2019
one tail current for extra voltage headroom that is important for
robust operation over process, voltage and temperature (PVT)
variations under a low supply voltage. One possibility is to
remove Mtail , bias the gate of M p at a different dc from Mn ,
and ac couple the input signal like in [37]. However, in the
audio frequency range, this requires the use of pseudo-resistors
whose resistance is highly susceptible to PVT variations, and
consequently the associated high-pass corner frequency and
the recovery time of the amplifier from any common-mode
disturbance are poorly defined [33].
Another possibility is to remove Ibias , as shown on the right
side of Fig. 6. Due to the lack of current source for Mn ,
the input dc voltage, which is the output dc voltage of the
Fig. 8. Simulated frequency response of the main amplifier in the LNA
DSL amplifier, needs to be set to the trip point of the inverter. with (top) and without (bottom) the positive feedback networks.
This trip point can be provided by a replica inverter, as shown
in Fig. 7. The replica inverter is composed of M0 and M1 ,
which are compound transistors and topologically match with
the input transistors M2 –M5 but with scaled sizes to reduce the
bias current. The trip point voltage Vb is tied to the body of
M2 –M5 to avoid large width that can lower the non-dominant
pole frequency [38] and elevate the IRN of the LNA [39]
due to parasitic capacitance. The use of compound transistors
is for the pseudo-cascode compensation [40], [41] using Cc
and Rc , instead of the Miller compensation, for higher PM
and unity-gain frequency. To reduce the bias current of the 2nd
stage, we use the positive feedback via C f and R f adapted Fig. 9. Schematic of the DSL amplifier in the LNA.
from the bandwidth enhancement techniques [42], [43] to have
sufficient PM. The simulated frequency response of the main
The output dc voltage is set to Vb via its CMFB amplifier com-
amplifier and the LNA at a 42-dB gain in Fig. 8 shows the
posed of M4 –M9 . The amplifier is biased at the picoampere
efficacy of the positive feedback used in conjunction with the
level to give a well-defined high-pass corner frequency of the
pseudo-cascode compensation. The 3-dB bandwidth of the
LNA less than 100 Hz. The stability of the closed loop formed
main amplifier increases by 3.3× and the PM is improved
by the main and DSL amplifiers is guaranteed by the very low
from 20◦ to 56◦ . The bandwidth of the LNA is improved from
frequency of a dominant pole at the node v x .
2.5 kHz to 6.1 kHz with a steeper roll-off near the low-pass
corner frequency. The output dc is set to Vmid , i.e., VDD/2 via
the CMFB amplifier composed of M9 –M13 , and Rcm C. Bandpass Filter
and Ccm . The pseudo-resistor Rcm with symmetric resis- In off-line MLP training using the features generated by the
tance [19] is implemented with two pFETs connected as M14 front-end python model, we found that the classification results
and M15 . are satisfying with low quality factors and low-BPF orders,
The DSL amplifier is shown in Fig. 9. Its input and output and hence, we did not choose to use the source-follower-based
are the output and input of the main amplifier, respectively. 4th-order BPF in [19] which can synthesize high quality factor
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: DESIGN OF AN ALWAYS-ON DEEP NEURAL NETWORK-BASED 1-μW VAD 1769
Fig. 11. Simulated differential SSF-based BPF output noise spectral density
Fig. 10. Schematic of the SSF-based BPF with output buffer and input dc of channel 15 at Vout .
bias. The fabricated circuit is a differential version.
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.
1770 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 54, NO. 6, JUNE 2019
Fig. 13. Schematic of (a) FWR and IAF and (b) pseudo-resistor Rfb that is composed of eight diode-connected pFETs connected in series.
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: DESIGN OF AN ALWAYS-ON DEEP NEURAL NETWORK-BASED 1-μW VAD 1771
and the input-slope-dependent delay are not detrimental; they The 100 classification/s throughput requires that all opera-
are included in the front-end model, and the MLP can adapt tions of add and accumulation need to finish within 10 ms,
accordingly through training. and a clock of 500 kHz is thus sufficient for using one single
accumulator. Given the fact that only the 1st hidden layer
has 9-bit input, and other layers have 1-bit input, part of the
E. Binarized Multilayer Perceptron register is clock-gated for power saving. The full register and
a 15-bit adder are used for the 1st hidden layer computation
The block diagram of the digital binarized MLP is illus- indicated by l1 , and 7 bits of the full register and a 7-bit
trated in Fig. 16. Ripple counters are used as the interface adder are used for the subsequent layers. The input operand
between the asynchronous feature extraction front-end and the of the accumulator is selected via a mux according to the
synchronous inference computing. Nine-bit counters, one more weight value w. For the 1st hidden layer, the input feature d9
bit than the specified 8 bits, are used in case of overflow. Three is selected if w = 1, and otherwise its complement is selected.
counters are used for each channel. Each counter counts events For the rest of the hidden layers, 0, +1, or −1 is selected
in a 25-ms window, and three counters work in a cyclic way depending on both w and the activation of the previous layer.
with a 10-ms shift. The features of one frame are stored in a The activation function hard sigmoid HS(•) is simply the
16×9 block of the data memory DMEM. Because three frames inversion of the sign bit of the register after completing the
are needed for one classification, as discussed in Section II, accumulation of each neuron, and the 1-bit activation values
the features, e.g., in block 0, block 3, and block 6, are used are stored in an 84-bit register file. HS(•) is not applied to the
to classify the frame with its features stored in block 3. Old output layer. Instead, the difference of the accumulated values
features are overwritten by the new ones, e.g., the features of of the two output neurons is computed, controlled by lo2 , and
the 8th frame would be written into block 0, and so on. The the sign bit is the class, 1 for speech and 0 for non-speech.
1-b weights stored in the weight memory WMEM are obtained
from off-line training, and they are loaded through a scan chain
IV. E XPERIMENTAL R ESULTS
when the chip is powered up before normal operations. Both
DMEM and WMEM use latch-based design to work robustly The chip was fabricated in 1P6M 0.18-μm CMOS. The
under a low supply voltage. The addresses for the memories microphotograph of the die is shown in Fig. 17 with the
and the control signals for the counters and the computing building blocks labeled. The core area is 1.66×1.52 mm2 . The
engine are generated by the controller block. simulated power breakdown of the front-end at 0.6 V is shown
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.
1772 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 54, NO. 6, JUNE 2019
Fig. 17. Die microphotograph with building blocks and dimension labeled.
Fig. 19. Measured transfer functions (top) and input-referred noise spectral
density (bottom) of the LNA in four different gain settings.
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: DESIGN OF AN ALWAYS-ON DEEP NEURAL NETWORK-BASED 1-μW VAD 1773
Fig. 24. (a) Clean speech sample and label sequence. (b) Corresponding
metro 5-dB noisy speech sample and measured classification sequence.
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.
1774 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 54, NO. 6, JUNE 2019
TABLE II
C OMPARISON OF AFE
TABLE III
C OMPARISON OF AUDIO C LASSIFICATION S YSTEMS
digital implementation of BPF based on the charge-recovery rate is too low for VAD. The systems in [61] and [62] based on
design [59], this work including other building blocks besides spiking NNs are not very energy efficient. The reason is that
BPF shows 5× higher power efficiency when the frequency IBM’s TrueNorth is more of a general purpose platform that
range is equalized. can support various NN algorithms. The digital VAD in [2] is
Table III shows the comparison of this VAD with the other used to power-gate the automatic speech recognizer. It actu-
audio classification systems. While our design consumes the ally implemented three different algorithms: energy-based
lowest power among VAD, it is fair to also consider the (EB, 8.5 μW), harmonicity (HM, 24.4 μW), and modulated
classification rate. The 32-nm VAD [1] has slightly better frequency with NN (MF + NN, 22.3 μW). It is reported that
energy efficiency in terms of class/W/s. However, it may not MF + NN gives the best performance. With no fast Fourier
be the best choice for minimized power consumption in low- transform (FFT), the EB algorithm consumes 3× less power
throughput real-time intelligent sensing applications, because than the other two FFT-based ones, and HM consumes more
the energy efficiency decreases quickly with the power supply power than MF + NN because it takes two short-term FFT
voltage below 0.65 V due to leakage as indicated in the mea- per frame even though no NN computation is required. This
surement, even though the design already uses ULP transistors indicates the power-hungry nature of the conventional digital
with extremely low leakage and ultra-high Vth [1]. Techniques FFT approach for frequency analysis.
like transistor sizing and power gating can further suppress With the help of the customized feature extraction front-end
leakage current, but the energy efficiency can hardly get better model during the design time, we are able to use the same
than the optimized peak value at 0.65 V. The system in [60] classifier parameters for all test chips without costly chip-wise
for acoustic object detection consumes only 12 nW, but the feature measurement and classifier training, and relatively
system bandwidth is less than 500 Hz, and the classification small spread in speech and non-speech hit rates across dies
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: DESIGN OF AN ALWAYS-ON DEEP NEURAL NETWORK-BASED 1-μW VAD 1775
is achieved, despite the fact that for high power efficiency, [14] A. G. Katsiamis, E. M. Drakakis, and R. F. Lyon, “A biomimetic,
the front-end is designed in deep subthreshold analog, which 4.5 μW, 120+ dB, log-domain cochlea channel with AGC,” IEEE J.
Solid-State Circuits, vol. 44, no. 3, pp. 1006–1022, Mar. 2009.
is much more prone to PVT variations compared to the digital [15] B. Wen and K. Boahen, “A silicon cochlea with active coupling,” IEEE
counterpart. One open question is, is it possible to further Trans. Biomed. Circuits Syst., vol. 3, no. 6, pp. 444–455, Dec. 2009.
reduce the classification spread across dies by leveraging [16] S.-C. Liu, A. van Schaik, B. A. Minch, and T. Delbruck, “Asyn-
chronous binaural spatial audition sensor with 2×64×4 channel out-
the generalization capability of deep learning models without put,” IEEE Trans. Biomed. Circuits Syst., vol. 8, no. 4, pp. 453–464,
increasing the area of the analog circuits for feature extraction? Aug. 2014.
As an initial attempt, we performed software emulation of the [17] G. Yang, R. F. Lyon, and E. M. Drakakis, “A 6 μW per channel
VAD system with consideration of AFE parameter variation analog biomimetic cochlear implant processor filterbank architecture
with across channels AGC,” IEEE Trans. Biomed. Circuits Syst., vol. 9,
during BNN training. In other words, we slightly modified the no. 1, pp. 72–86, Feb. 2015.
flow shown in Fig. 2 so that the features for classifier training [18] S. Wang, T. J. Koickal, A. Hamilton, R. Cheung, and L. S. Smith, “A bio-
are from both “model with fit parameter” and “model with realistic analog CMOS cochlea filter with high tunability and ultra-steep
roll-off,” IEEE Trans. Biomed. Circuits Syst., vol. 9, no. 3, pp. 297–311,
parameter variation.” Features generated from different sets Jun. 2015.
of “model with parameter variation” are used for inference. [19] M. Yang, C. H. Chien, T. Delbruck, and S. C. Liu, “A 0.5 V 55 μW 64×2
Despite the 20× augmentation in training datasets, the relative channel binaural silicon cochlea for event-driven stereo-audio sensing,”
IEEE J. Solid-State Circuits, vol. 51, no. 11, pp. 2554–2569, Nov. 2016.
reduction of the hit rate variation is marginal: less than 10%
[20] P. R. Kinget, “Device mismatch and tradeoffs in the design of analog
across several runs. Further studies need to be done to explore circuits,” IEEE J. Solid-State Circuits, vol. 40, no. 6, pp. 1212–1224,
the limit of this scheme for variation reduction. Jun. 2005.
[21] B. Murmann, “Digitally assisted analog circuits,” IEEE Micro, vol. 26,
no. 2, pp. 38–47, Mar. 2006.
ACKNOWLEDGMENT [22] Y. Chiu, “Equalization techniques for nonlinear analog circuits,” IEEE
Commun. Mag., vol. 49, no. 4, pp. 132–139, Apr. 2011.
The authors would like to thank N. Mesgarani, Y. Tsividis, [23] J. Zhang, L. Huang, Z. Wang, and N. Verma, “A seizure-detection IC
M. Verhelst, and X.-L. Zhang for the valuable discussion and employing machine learning to overcome data-conversion and analog-
help. processing non-idealities,” in Proc. IEEE Custom Integr. Circuits Conf.
(CICC), Sep. 2015, pp. 1–4.
[24] K. M. H. Badami, S. Lauwereins, W. Meert, and M. Verhelst,
R EFERENCES “A 90 nm CMOS, 6 μW power-proportional acoustic sensing frontend
for voice activity detection,” IEEE J. Solid-State Circuits, vol. 51, no. 1,
[1] A. Raychowdhury, C. Tokunaga, W. Beltman, M. Deisher, J. W. Tschanz, pp. 291–302, Jan. 2016.
and V. De, “A 2.3 nJ/frame voice activity detector-based audio front-end [25] X. L. Zhang and D. Wang, “Boosting contextual information for
for context-aware system-on-chip applications in 32-nm CMOS,” IEEE deep neural network based voice activity detection,” IEEE/ACM Trans.
J. Solid-State Circuits, vol. 48, no. 8, pp. 1963–1969, Aug. 2013. Audio, Speech, Language Process., vol. 24, no. 2, pp. 252–264,
[2] M. Price, J. Glass, and A. P. Chandrakasan, “A scalable speech recog- Feb. 2016.
nizer with deep-neural-network acoustic models and voice-activated [26] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio,
power gating,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. “Binarized neural networks,” in Proc. Adv. Neural Inf. Process. Syst.,
Tech. Papers, Feb. 2017, pp. 244–245. vol. 29, 2016, pp. 4107–4115.
[3] R. Sarpeshkar, “Analog versus digital: Extrapolating from electronics to [27] J. Lee, C. Kim, S. Kang, D. Shin, S. Kim, and H.-J. Yoo,
neurobiology,” Neural Comput., vol. 10, no. 7, pp. 1601–1638, 1998. “UNPU: A 50.6 TOPS/W unified deep neural network accelerator with
[4] L. T. Lin, H.-F. Tseng, D. B. Cox, S. S. Viglione, D. P. Conrad, and 1 b-to-16 b fully-variable weight bit-precision,” in IEEE Int. Solid-State
R. G. Runge, “A monolithic audio spectrum analyzer,” IEEE J. Solid- Circuits Conf. (ISSCC) Dig. Tech. Papers, San Francisco, CA, USA,
State Circuits, vol. JSSC-18, no. 1, pp. 40–45, Feb. 1983. Feb. 2018, pp. 218–220.
[5] N. C. Bui, J. J. Monbaron, and J. G. Michel, “An integrated voice [28] Y.-J. Chen, C.-W. Wei, Y. FanChiang, Y.-L. Meng, Y.-C. Huang, and
recognition system,” IEEE J. Solid-State Circuits, vol. JSSC-18, no. 1, S.-J. Jou, “Neuromorphic pitch based noise reduction for monosyllable
pp. 75–81, Feb. 1983. hearing aid system application,” IEEE Trans. Circuits Syst. I, Reg.
[6] Y. Kuraishi, K. Nakayama, K. Miyadera, and T. Okamura, “A single- Papers, vol. 61, no. 2, pp. 463–475, Feb. 2014.
chip 20-channel speech spectrum analyzer using a multiplexed switched-
[29] N. Parihar, J. Picone, D. Pearce, and H. G. Hirsch, “Performance analysis
capacitor filter bank,” IEEE J. Solid-State Circuits, vol. JSSC-19, no. 6,
of the Aurora large vocabulary baseline system,” in Proc. Eur. Signal
pp. 964–970, Dec. 1984.
Process. Conf., 2004, pp. 553–556.
[7] J. S. Chang and Y. C. Tong, “A micropower-compatible time-multiplexed
SC speech spectrum analyzer design,” IEEE J. Solid-State Circuits, [30] J. Thiemann, N. Ito, and E. Vincent, “The diverse environments multi-
vol. 28, no. 1, pp. 40–48, Jan. 1993. channel acoustic noise database: A database of multichannel environ-
[8] R. F. Lyon and C. Mead, “An analog electronic cochlea,” IEEE Trans. mental noise recordings,” J. Acoust. Soc. Amer., vol. 133, no. 5, p. 3591,
Acoust., Speech Signal Process., vol. ASSP-36, no. 7, pp. 1119–1134, May 2013.
Jul. 1988. [31] E. Vittoz and J. Fellrath, “CMOS analog integrated circuits based on
[9] L. Watts, D. A. Kerns, R. F. Lyon, and C. A. Mead, “Improved weak inversion operations,” IEEE J. Solid-State Circuits, vol. 12, no. 3,
implementation of the silicon cochlea,” IEEE J. Solid-State Circuits, pp. 224–231, Jun. 1977.
vol. 27, no. 5, pp. 692–700, May 1992. [32] Y.-P. Chen et al., “An injectable 64 nW ECG mixed-signal SoC in 65 nm
[10] R. Sarpeshkar, M. W. Baker, C. D. Salthouse, J. J. Sit, L. Turicchia, for arrhythmia monitoring,” IEEE J. Solid-State Circuits, vol. 50, no. 1,
and S. M. Zhak, “An analog bionic ear processor with zero-crossing pp. 375–390, Jan. 2015.
detection,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. [33] P. Harpe, H. Gao, R. V. Dommele, E. Cantatore, and
Papers, Feb. 2005, pp. 78–79. A. H. M. van Roermund, “A 0.20 mm2 3 nW signal acquisition
[11] E. Fragniere, “A 100-channel analog CMOS auditory filter bank for IC for miniature sensor nodes in 65 nm CMOS,” IEEE J. Solid-State
speech recognition,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Circuits, vol. 51, no. 1, pp. 240–248, Jan. 2016.
Dig. Tech. Papers, Feb. 2005, pp. 140–141. [34] S. Rai, J. Holleman, J. N. Pandey, F. Zhang, and B. Otis, “A 500 μW
[12] V. Chan, S.-C. Liu, and A. van Schaik, “AER EAR: A matched silicon neural tag with 2 μVrms AFE and frequency-multiplying MICS/ISM
cochlea pair with address event representation interface,” IEEE Trans. FSK transmitter,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig.
Circuits Syst. I, Reg. Papers, vol. 54, no. 1, pp. 48–59, Jan. 2007. Tech. Papers, Feb. 2009, pp. 212–213.
[13] T. J. Hamilton, C. Jin, A. V. Schaik, and J. Tapson, “An active 2-D silicon [35] X. Zou, W. S. Liew, L. Yao, and Y. Lian, “A 1 V 22 μW 32-channel
cochlea,” IEEE Trans. Biomed. Circuits Syst., vol. 2, no. 1, pp. 30–43, implantable EEG recording IC,” in IEEE Int. Solid-State Circuits Conf.
Mar. 2008. (ISSCC) Dig. Tech. Papers, Feb. 2010, pp. 126–127.
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.
1776 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 54, NO. 6, JUNE 2019
[36] D. Han, Y. Zheng, R. Rajkumar, G. Dawe, and M. Je, “A 0.45 V 100- [60] S. Jeong et al., “A 12 nW always-on acoustic sensing and object
channel neural-recording IC with sub-μW/channel consumption in recognition microsystem using frequency-domain feature extraction and
0.18 μm CMOS,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. SVM classification,” in IEEE Int. Solid-State Circuits Conf. (ISSCC)
Tech. Papers, Feb. 2013, pp. 290–291. Dig. Tech. Papers, Feb. 2017, pp. 362–363.
[37] B. Sporrer et al., “A fully integrated dual-channel on-coil CMOS receiver [61] S. K. Esser et al., “Convolutional networks for fast, energy-efficient
for array coils in 1.5–10.5 T MRI,” IEEE Trans. Biomed. Circuits Syst., neuromorphic computing,” Proc. Nat. Acad. Sci. USA, vol. 113,
vol. 11, no. 6, pp. 1245–1255, Dec. 2017. pp. 11441–11446, Aug. 2016.
[38] B. Razavi, Design of Analog CMOS Integrated Circuits, 1st ed. Boston, [62] W.-Y. Tsai et al., “Always-on speech recognition using TrueNorth, a
MA, USA: McGraw-Hill, 2000. reconfigurable, neurosynaptic processor,” IEEE Trans. Comput., vol. 66,
[39] R. R. Harrison, “The design of integrated circuits to observe brain no. 6, pp. 996–1007, Jun. 2017.
activity,” Proc. IEEE, vol. 96, no. 7, pp. 1203–1216, Jul. 2008.
[40] V. Saxena and R. J. Baker, “Compensation of CMOS op-amps using
split-length transistors,” in Proc. IEEE Int. Midwest Symp. Circuits Syst.
(MWSCAS), 2008, pp. 109–112.
[41] M. Taherzadeh-Sani and A. A. Hamoui, “A 1-V process-insensitive
current-scalable two-stage opamp with enhanced DC gain and settling
behavior in 65-nm digital CMOS,” IEEE J. Solid-State Circuits, vol. 46,
no. 3, pp. 660–668, Mar. 2011.
[42] A. Vasilopoulos, G. Vitzilaios, G. Theodoratos, and Y. Papananos, Minhao Yang (S’11–M’16) received the Ph.D.
“A low-power wideband reconfigurable integrated active-RC filter degree in physics from ETH Zurich, Zürich,
with 73 dB SFDR,” IEEE J. Solid-State Circuits, vol. 41, no. 9, Switzerland, in 2015.
pp. 1997–2008, Sep. 2006. He was a Post-Doctoral Researcher with Columbia
[43] M. Abdulaziz, M. Törmänen, and H. Sjöland, “A compensation tech- University, New York, NY, USA. He is currently
nique for two-stage differential OTAs,” IEEE Trans. Circuits Syst. II, a Collaborateur Scientifique with EPFL, Lausanne,
Exp. Briefs, vol. 61, no. 8, pp. 594–598, Aug. 2014. Switzerland. His post-doctoral research was partly
[44] M. D. Matteis, A. Pezzotta, S. D’Amico, and A. Baschirotto, “A 33 MHz supported by the Early Postdoc Mobility Fellow-
70 dB-SNR super-source-follower-based low-pass analog filter,” IEEE J. ship from the Swiss National Science Foundation.
Solid-State Circuits, vol. 50, no. 7, pp. 1516–1524, Jul. 2015. His research interests include ultra-low-power (ULP)
[45] Y. Xu, S. Leuenberger, P. K. Venkatachala, and U.-K. Moon, inference sensing systems, event-driven sensors like
“A 0.6 mW 31 MHz 4th -order low-pass filter with +29 dBm IIP3 using spiking silicon retina and cochlea, and spike coding and processing.
self-coupled source follower based biquads in 0.18 μm CMOS,” in Proc.
IEEE Symp. VLSI Circuits, Jun. 2016, pp. 132–133.
[46] M. De Matteis and A. Baschirotto, “A biquadratic cell based on the
flipped-source-follower circuit,” IEEE Trans. Circuits Syst. II, Exp.
Briefs, vol. 64, no. 8, pp. 867–871, Aug. 2017.
[47] A. Thanachayanont, “Low-voltage low-power high-Q CMOS RF band-
pass filter,” Electron. Lett., vol. 38, no. 13, pp. 615–616, Jun. 2002.
[48] Z. Gao, J. Ma, M. Yu, and Y. Ye, “A fully integrated CMOS active Chung-Heng Yeh received the B.S. degree in elec-
bandpass filter for multiband RF front-ends,” IEEE Trans. Circuits Syst. trical engineering from National Taiwan University,
II, Exp. Briefs, vol. 55, no. 8, pp. 718–722, Aug. 2008. Taipei, Taiwan, in 2010, and the M.S. degree in
[49] M. S. Hosny and J. Hanson, “A wide-band, high-precision CMOS electrical engineering from Columbia University,
rectifier,” Analog Integr. Circuits Signal Process., vol. 5, no. 2, New York, NY, USA, in 2013, where he is currently
pp. 183–190, Mar. 1994. pursuing the Ph.D. degree in electrical engineering.
[50] E. Rodriguez-Villegas, P. Corbishley, C. Lujan-Martinez, and His current research interests include information
T. Sanchez-Rodriguez, “An ultra-low-power precision rectifier for processing in spike domain, large-scale neural sys-
biomedical sensors interfacing,” Sens. Actuators A, Phys., vol. 153, tem emulation, and neural inspired algorithm.
no. 2, pp. 222–229, Aug. 2009.
[51] Z. Wang, “Novel pseudo RMS current converter for sinusoidal signals
using a CMOS precision current rectifier,” IEEE Trans. Instrum. Meas.,
vol. 39, no. 4, pp. 670–671, Aug. 1990.
[52] Z. Wang, “Full-wave precision rectification that is performed in current
domain and very suitable for CMOS implementation,” IEEE Trans.
Circuits Syst. I, Fundam. Theory Appl., vol. 39, no. 6, pp. 456–462,
Jun. 1992.
[53] S.-C. Liu, J. Kramer, G. Indiveri, T. Delbruck, and R. Douglas, Analog
VLSI: Circuits and Principles, 1st ed. Cambridge, MA, USA: A Bradford Yiyin Zhou received the B.E. degree in electrical
Book, 2002. engineering from Shanghai Jiao Tong University,
[54] M. S. J. Steyaert, W. Dehaene, J. Craninckx, M. Walsh, and P. Real, Shanghai, China, in 2007, and the M.S. and Ph.D.
“A CMOS rectifier-integrator for amplitude detection in hard disk servo degrees in electrical engineering from Columbia
loops,” IEEE J. Solid-State Circuits, vol. 30, no. 7, pp. 743–751, University, New York, NY, USA, in 2009 and 2015,
Jul. 1995. respectively.
[55] S. M. Zhak, M. W. Baker, and R. Sarpeshkar, “A low-power wide He is currently a Post-Doctoral Research
dynamic range envelope detector,” IEEE J. Solid-State Circuits, vol. 38, Scientist with the Department of Electrical
no. 10, pp. 1750–1753, Oct. 2003. Engineering, Columbia University. He is an
[56] M. S. J. Steyaert and W. M. C. Sansen, “A micropower low-noise active team member of the Fruit Fly Brain
monolithic instrumentation amplifier for medical purposes,” IEEE Observatory Team, Columbia University,
J. Solid-State Circuits, vol. JSSC-22, no. 6, pp. 1163–1168, Dec. 1987. and has co-organized three Fruit Fly Brain Hackathons. He is also
[57] T. Fawcett, “An introduction to ROC analysis,” Pattern Recognit. Lett., interested in massively parallel neural computation on high-performance
vol. 27, no. 8, pp. 861–874, Jun. 2006. computing devices. His research interests include formal methods for
[58] G. Chen, C. Parada, and G. Heigold, “Small-footprint keyword spotting spike-time-based representation of sensory information and system
using deep neural networks,” in Proc. IEEE Int. Conf. Acoust. Speech identification, and the logic of information processing of the fruit fly brain
Signal Process., May 2014, pp. 4087–4091. on both neuroinformation processing and neural circuit levels.
[59] H. S. Wu, Z. Zhang, and M. C. Papaefthymiou, “A 13.8 μW binaural Dr. Zhou received the Jury Award from the Department of Electrical
dual-microphone digital ANSI S1.11 filter bank for hearing aids with Engineering, Columbia University, in 2016, for outstanding achievement
zero-short-circuit-current logic in 65 nm CMOS,” in IEEE Int. Solid- by a graduate student in the area of systems, communications, and signal
State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2017, pp. 348–349. processing.
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: DESIGN OF AN ALWAYS-ON DEEP NEURAL NETWORK-BASED 1-μW VAD 1777
Joao P. Cerqueira (S’17) received the B.S. degree Mingoo Seok (S’05–M’11–SM’18) received the
(Hons.) in electrical engineering from the University B.S. degree (summa cum laude) in electrical engi-
of Brasília, Brasília, Brazil, in 2014, and the M.S. neering from Seoul National University, Seoul,
degree in electrical engineering from Columbia South Korea, in 2005, and the M.S. and Ph.D.
University, New York, NY, USA, in 2016, where he degrees in electrical engineering from the University
is currently pursuing the Ph.D. degree in electrical of Michigan, Ann Arbor, MI, USA, in 2007 and
engineering. 2011, respectively.
His current research interests include In 2011, he joined Texas Instruments Inc., Dallas,
energy-efficient integrated circuits and computer TX, USA, as a Technical Staff Member. Since
architecture. 2012, he has been with the Department of Electrical
Mr. Cerqueira received the Science Without Engineering, Columbia University, New York, NY,
Borders Fellowship from CAPES, the Lemann Foundation Fellowship, and USA, where he is currently an Associate Professor. His current research
the Qualcomm Innovation Fellowship. interests include variation, voltage, aging, thermal-adaptive circuits and archi-
tecture, ultra-low-power (ULP) SoC design for emerging embedded sys-
Aurel A. Lazar (S’77–M’80–SM’90–F’93–LF’16) tems, machine-learning VLSI architecture and circuits, and nonconventional
was a Principal Investigator leading a number of hardware design.
computer networking research groups in the Depart- Dr. Seok received the 1999 Distinguished Undergraduate Scholarship
ment of Electrical Engineering, Columbia Univer- from the Korea Foundation for Advanced Studies, the 2005 Doctoral
sity, New York, NY, USA, for 20 years. He covered a Fellowship from the Korea Foundation for Advanced Studies, and the
broad set of research topics/fields, including building 2008 Rackham Pre-Doctoral Fellowship from the University of Michigan,
major switching hardware, architecting broadband the 2009 AMD/CICC Scholarship Award for picowatt voltage reference
kernels for programmable networks, and creating work, the 2009 DAC/ISSCC Design Contest for the 35-pW sensor platform
game theory models for resource allocation. He also design, and the 2015 NSF CAREER Award. He has served as an Associate
ran a networking start-up as a CEO. Some 15 years Editor for the IEEE T RANSACTIONS ON C IRCUITS A ND S YSTEMS I
ago, predicting that Moore’s law will soon reach from 2013 to 2015 and the IEEE T RANSACTIONS ON V ERY L ARGE S CALE
its limits, he switched his field of research to computational neuroscience I NTEGRATION S YSTEMS since 2015 and IEEE S OLID -S TATE C IRCUITS
in search of principles of building cognitive machines, with capabilities L ETTERS since 2017.
well beyond those powered by von Neumann architectures. He currently
leads research projects in computing with fruit fly brain circuits, in building
interactive computing tools for the Fruit Fly Brain Observatory and on creating
neuroinformation processing machines. His research has drawn support from
a number of funding agencies including the AFSOR, NIH, and NSF.
Dr. Lazar received the 2003 IFIP/IEEE Dan Stokesberry Memorial Award
in recognition of “the most distinguished technical contributions to the growth
and understanding of the field of network management.”
Authorized licensed use limited to: Johns Hopkins University. Downloaded on June 19,2024 at 02:08:56 UTC from IEEE Xplore. Restrictions apply.