Professional Documents
Culture Documents
Low Power Unsupervised Anomaly Detection by Non-Parametric Modeling of Sensor Statistics
Low Power Unsupervised Anomaly Detection by Non-Parametric Modeling of Sensor Statistics
Low Power Unsupervised Anomaly Detection by Non-Parametric Modeling of Sensor Statistics
Sliding Outlier
Estimated Window
PDF Estimated
Evidences of CDF
R.V. x
Gaussian Evidences of
Kernels R.V. x
Outlier
Sigmoid
Kernels
Sliding
Window
(a) (b) Outlier
Fig. 2: (a) Non-parametric PDF estimation using Gaussian kernels at observed
random variable (RV) samples, (b) Non-parametric CDF estimation using
Sigmoid kernels at observed RV samples. Outlier
using digitally-stored sensor data, a four-bit current steering
digital-to-analog converter, and hold cells. A mixed-signal
pipeline is discussed to synthesize sensor data density function,
which is used to detect outliers in runtime. To benchmark Fig. 3: Sliding window for real-time KDE-based density estimation of
AEGIS, we use time-series dataset from Yahoo [13]. An outlier streaming sensor data considering a limited number of previously validated
detection with an f1-score of more than 0.87 is achieved by samples.
optimizing the programmable parameters such as the length of using Gaussian kernels. Fig. 2(b) shows CDF estimation using
the sliding window and decision thresholds in the discussed Sigmoid kernels. Parameter h can be optimized to minimize
implementation. Our approach on average consumes 75µW the mean integrated square error (MISE) between the esti-
for a sampling rate of 2 MHz while using ten recent inlier mated and true PDF/CDF of a random variable (RV). A rule-
samples for density estimation. of-thumb [16] simplifies the choice of h as ≈ 1.06 × σ × N − 5
1
The rest of the paper is organized as follows: Sec. II to minimize the MISE, where σ is the standard deviation of
provides the background on KDE and its application for observed samples of x.
outlier detection. Sec. III describes a CMOS GGC design
for KDE-based probability density function (PDF) estimation. B. Outlier detection by statistical characterization of data
The overall architecture of AEGIS is described in Sec. IV. In Fig. 3, using a sliding window and Eq. (1), the probability
Sec. V discusses temperature and process variation-induced density of a subsequent data (Psample ) can be extracted as
inaccuracies in the learned PDF along with optimization of N −1
1 X xsample − xi
sliding window length and the sampling frequency for reliable Psample = k (2)
N i=0 h
PDF estimation. Sec. VI presents the optimization of algorith-
mic and architectural parameters for efficient outlier detection. where xsample denotes the incoming sensor data sample and
Sec. VII discusses adaptability of AEGIS to different IoT xi are previously validated samples. xsample is classified as
applications. Finally, Sec. VIII summarizes the key results and an inlier or outlier following the expression below
concludes. (
Outlier, Psample < PT hres
II. BACKGROUND xsample = (3)
Inlier, Psample ≥ PT hres
A. Kernel density estimation (KDE) where PT hres is a probability threshold for outlier detec-
In KDE, if x1 , x2 , ..., xN are the observed samples of x, tion. For Gaussian kernels, the standard deviation of kernel
probability density function (PDF) of x, F̃ (x), is estimated as (σKernel ) along with PT hres control the classification bound-
N −1
1 X x − xi ary. If xsample is an inlier then it is included in the window
F̃ (x) = k (1) and the statistical model is updated in a first-in-first-out (FIFO)
N i=0 h
manner, i.e, by replacing the earliest inlier with xsample .
Here, k(x) is a kernel function for PDF estimation, h is Otherwise, xsample is flagged as an anomalous sample.
the kernel function width, and N is the number of ob-
III. G ILBERT G AUSSIAN C IRCUIT-BASED S TATISTICAL
served samples. For probability density function (PDF) es-
D ENSITY F UNCTION C HARACTERIZATION
timation, functions such as Gaussian, Uniform, Triangular,
and Epanechnikov [14] are applicable as k(x). A cumulative A. Gilbert Gaussian Circuit-based Implementation of Kernel
density function (CDF) can also be similarly estimated where Function
functions such as Inverse-tangent and Sigmoid are applicable We use a Gilbert Gaussian circuit (GGC) to implement a
as k(x) [15]. Fig. 2(a) shows the KDE-based PDF estimation kernel function for KDE-based PDF synthesis. Henceforth, we
3
VDD RL
VST=0V
MP7 Radj MP8 VST=0.25V VST1 Icol
VST=0.5V GGC
MP1 S0 S1 S2 VST= A1
MP2 0.75V Vout
R 2R 4R
Ideal
Gaussian VST2 VCM
GGC
Ibias MP3 MP4 Increasing Increasing
Vin VST Radj Radj
Iout
MP5 MP6
VSTN
Hold VST2
Bank
VDD
Hold Cell Maximum
Maximum
1x 2x 4x 8x VDD Avg. RMSE
Avg. RMSE
Vbias Vbp Minimum
Minimum
Avg.RMSE
Avg.RMSE
VSTi[0] VSTi[1] VSTi[2] VSTi[3] Mh2 Mh3
Cf2
DAC IDAC Vd2
EN Vd1
TG Cf1 Vbn
Mh4
RST TG CS VSTi Mh1
(a) (b)
Fig. 8: Avg. RMSE of learned PDF vs. NIN for (a) Gaussian and (b) Gaussian
mixture functions.
Fig. 6: Integrated 4-bit Current steering DAC & Hold-cell. [Inset: Transmis-
sion Gates are used to control Enable and Reset operations] in the sub-threshold regime at minimal power consumption.
After PDF learning, analog inlier samples held on sampling
Fig. 5 shows the overall architecture of AEGIS comprising
capacitance (CS ) will degrade due to thermal leakage in CS ,
of a PDF learner, digital-to-analog converter (DAC) array,
Cf 1 & Cf 2 and sub-threshold leakage in transmission gates.
analog voltage hold cells, and digital control logic. The design
We utilize negative feedback provided by amplifier stages
of PDF learner was discussed in Fig. 4. Since PDF learner
through feedback capacitors (Cf 1 & Cf 2 ) to compensate VST i .
operates in analog mode, Current-steering DAC (CSDAC)
Degrading VST i will alter the bias of Mh1 and Mh3 , thereby,
shown in Fig. 6 is used to convert the inliers in sample bank
increases Vd1 & Vd2 due to negative gain of CS stages. This
(V ST i ) (xi in Eq. (2)) to analog domain. CSDAC uses binary-
results in a potential difference across Cf 1 & Cf 2 which in turn
weighted PMOS current sources to generate a current (IDAC )
drives a current to restore VST i on CS , therefore, improves
proportional to the digital word of a sample. IDAC charges
retention time of the hold cell.
the sampling capacitance CS on the hold cell [Fig. 6] for a
For online outlier detection, AEGIS utilizes a sliding win-
duration TS , developing VST i . Overdrive voltage constraints
dow of past samples. Using a window of NIN latest inliers,
on PMOS current sources in CSDAC limit VST i to reliably
PDF learner in Fig. 5 learns the PDF of the sensor stream.
vary within 0−3VDD /4 range. Therefore, we use GGC with
Incoming sample Vsamp is applied to the input of PDF learner,
PMOS input stages for compatibility. Sampling duration TS is
and PDF learner predicts its likelihood based on the earlier
controlled using EN . Analog samples are retained on the hold
inliers. The control logic in Fig. 5 determines if the likelihood
cell during PDF learning and outlier detection. Conversely,
of Vsamp is high enough based on the past inliers and if Vsamp
while updating the detection model, content on the hold cells
should be accepted as an inlier. If Vsamp is an inlier, control
are reset using RST pulse.
logic activates ADC and V samp (digitized Vsamp ) is added
to the sample bank where it replaces the first arrived sample.
Overdrive Voltage Limits Otherwise, Vsamp is marked as an outlier.
Overdrive Voltage
Limit
Simulated V. D ISCUSSIONS
Simulated PDF
PDF KDE In this section, we discuss the accuracy of PDF learner to
KDE learn various statistical distributions using a sliding window
Ideal
Ideal PDF PDF of past observations. We present an analysis to determine
the optimal number of GGCs (NIN ). Further, the impact of
temperature & process variability and sampling frequency on
PDF learner performance is discussed.
A. Accuracy to learn various statistical densities
The accuracy of the PDF learner to learn the statistics
(a) (b) of streaming data using a sliding window is analyzed by
Fig. 7: Non-parametric kernel density estimation enables PDF learning of testing with the sensor data following Gaussian and Gaussian
samples from arbitrary statistics. PDF extracted by PDF learner against ground mixture statistics. In Fig. 7, ideal PDFs and learned PDFs
truth for (a) Gaussian, and (b) Gaussian Mixture distributions.
are compared. The learned PDFs are obtained using Eq.
(1) and HSPICE simulations. A hundred randomly drawn
Hold cell is shown in Fig. 6. A hold cell consists of samples from the corresponding statistics are used to learn
common source (CS) amplifiers to improve retention time their PDF. In Fig. 7(a), the samples are drawn from the
characteristics. We use both NMOS (Mh1 ) and PMOS (Mh3 ) Gaussian statistics N (0.4, 0.05). In Fig. 7(b), the samples
input stage based amplifiers to allow 0-3VDD /4 range for are drawn from the Gaussian Mixture statistics represented
the hold voltage VST i . CS amplifiers are designed to operate
5
Avg.RMSE
B. Effect of Sliding Window Width
0.1
We analyze the robustness of PDF learning in AEGIS Gaussian Mixture
considering (i) the number of samples, i.e., NIN and (ii) by Uniform
0.05
limiting the maximum power consumption of the circuit. Using Gaussian
300
perature sensitivity of learned PDFs for different statistics and
Mean
200 Avg.RMSE is given by sensitivity=∆Avg.RM SE/∆T |T0 =30o C . Temper-
ature variation has a substantial impact on the learned PDF
100 and in Sec. VI, we will show that optimizing KDE parameters
enables AEGIS to manifest high resilience against temperature
0 20 variations.
10 30 50
NIN E. Effect of Sampling Frequency
(a) (b)
We analyze the impact of sampling frequency on the accu-
racy of the PDF learner. Ideally, PDF learner should be able to
Fig. 9: (a) Average power consumption of PDF learner and integrated DAC
& hold cell modules for different sliding window length (NIN ). (b) Impact
operate at a wide range of frequencies to support varying sam-
of process variation on learned PDF is analyzed in terms of Maximum and pling rates in sensor nodes. However, factors such as the slew-
Mean average RMSE for Gaussian statistics. rate of OPAMP and retention time of hold cells impose bounds
on the sampling frequency (fsamp ). Fig. 10(b) shows the
average RMSE vs. fsamp for different sensor stream statistics.
C. Effect of Process Variability While operating at higher frequencies, OPAMP output fails
To study the impact of process variation on the learned PDF, to attain precise values in short duration, resulting in higher
we consider variations in threshold voltage (VT H ) of transis- average RMSE. On the other hand, samples held on the hold
tors in PDF learner. The resilience of PDF learner against pro- cells degrade over time due to various leakage components,
cess variation is analyzed in terms of maximum average RMSE which in turn results in inaccurate PDF estimation at the
(MaxAvg.RM SE ) and mean average RMSE (MeanAvg.RM SE ) slower sampling frequency. Therefore, PDF learner functions
compared against the ground truth. Maximum average RMSE optimally in a certain fsamp range. Various design components
and mean average RMSE are computed for Gaussian statistics in PDF learner, such as hold cells and OPAMP, can be
over hundred Monte Carlo iterations. In Fig. 9(b), VT H optimized to match fsamp to the target sampling rate in a
variations have a significant impact on the accuracy of learned sensor node.
PDF. MaxAvg.RM SE and MeanAvg.RM SE increases with VT H
variance. Nonetheless, in Sec. VI, we will discuss that, despite VI. S IMULATION R ESULTS
the process variability, a high accuracy outlier detection can In this section, we provide an overview of Yahoo real
be obtained by optimizing the top-level KDE parameters such time-series dataset used to benchmark AEGIS performance.
as σKernel . Process variation induces error in classification A detailed analysis to determine the optimal parameters for
hyper-plane, thus, degrades detection efficiency. Optimizing high accuracy outlier detection is presented. We have also
σKernel can partially restore the classification boundary to investigated the impact of environmental noise on the detection
suppress the impact of process variability on classification accuracy. Subsequently, we analyze the impact of temperature
hyper-plane and allows high accuracy detection.
6
Timeseries 4
Timeseries 1 Timeseries 3
Timeseries 2 Timeseries 5
σVth=20mV
T=90oC
σNoise=10mV σVth=15mV
nDAC=4 T=60oC
σNoise=25mV σVth=10mV
nDAC=6
σVth=5mV T=30oC
σNoise=50mV nDAC=8
σVth=0mV T=0oC
False
False Positives False
False False
Positives Positives Negatives Positives
False
Negatives
the detection model depending on application specifics by “Distributed Deviation Detection in Sensor Networks,” SIGMOD Rec.,
programming NIN , σKernel , and PT hres . vol. 32, no. 4, pp. 77–82, Dec. 2003.
[12] S. Haji Alizad, “A Nonparametric Cumulative Distribution Function
Estimation and Random Number Generator Circuit,” UIC Dissertations
R EFERENCES and Theses, http://hdl.handle.net/10027/21929.
[13] YahooLabs, “S5 - a labeled anomaly detection dataset,” https://
[1] Y. Zhang, N. Meratnia, and P. Havinga, “Outlier Detection Techniques
webscope.sandbox.yahoo.com/, accessed: 11/08/2018.
for Wireless Sensor Networks: A Survey,” IEEE Communications Sur-
[14] V. Epanechnikov, “Non-Parametric Estimation of a Multivariate Proba-
veys Tutorials, vol. 12, no. 2, pp. 159–170, Feb. 2010.
bility Density,” Theory of Probability & Its Applications, vol. 14, no. 1,
[2] E. M. Knorr and R. T. Ng, “Algorithms for Mining Distance-Based
pp. 153–158, 1969.
Outliers in Large Datasets,” in Proc. International Conference on Very
[15] M. Lejeune and P. Sarda, “Smooth Estimators of Distribution and
Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann
Density Functions,” Computational Statistics & Data Analysis, vol. 14,
Publishers Inc., 1998, pp. 392–403.
no. 4, pp. 457 – 471, 1992.
[3] S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient Algorithms for
[16] B. W. Silverman, Density Estimation for Statistics and Data Analysis.
Mining Outliers from Large Data Sets,” in Proc. ACM SIGMOD
CRC Press, 1986.
International Conference on Management of Data, 2000, pp. 427–438.
[17] https://www.eda.ncsu.edu/wiki/FreePDK.
[4] E. M. Jordaan and G. Smits, “Robust Outlier Detection Using SVM Re-
[18] K. Kang and T. Shibata, “An On-Chip-Trainable Gaussian-Kernel Ana-
gression,” in IEEE International Joint Conference on Neural Networks,
log Support Vector Machine,” IEEE Trans. on Circuits and Systems I:
vol. 3, Jul. 2004, pp. 2017–2022.
Regular Papers, vol. 57, no. 7, pp. 1513–1524, Jul. 2010.
[5] S. M. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie, “High-
[19] K. Kang and T. Shibata, “An On-Chip-Trainable Gaussian-kernel Analog
Dimensional and Large-Scale Anomaly Detection Using a Linear One-
Support Vector Machine,” in IEEE International Symposium on Circuits
Class SVM with Deep Learning,” Pattern Recognition, vol. 58, pp. 121–
and Systems, May 2009, pp. 2661–2664.
134, 2016.
[20] S. P. Singh, J. V. Hansom, and J. Vlach, “A New Floating Resistor for
[6] S. Rajasegarar, C. Leckie, M. Palaniswami, and J. C. Bezdek, “Quar-
CMOS technology,” IEEE Trans. on Circuits and Systems, vol. 36, no. 9,
ter Sphere Based Distributed Anomaly Detection in Wireless Sensor
pp. 1217–1220, Sep. 1989.
Networks,” in IEEE International Conference on Communications, Jun.
[21] B. Razavi, Design of Analog CMOS Integrated Circuits, 1st ed. New
2007, pp. 3864–3869.
York, USA: McGraw-Hill, Inc., 2001.
[7] E. Elnahrawy and B. Nath, “Context-Aware Sensors,” in Wireless Sensor
[22] M. Munir, S. Erkel, A. Dengel, and S. Ahmed, “Pattern-Based Con-
Networks. Springer Berlin Heidelberg, 2004, pp. 77–93.
textual Anomaly Detection in HVAC Systems,” in IEEE International
[8] J. W. Branch, C. Giannella, B. Szymanski, R. Wolff, and H. Kargupta,
Conference on Data Mining Workshops (ICDMW), 2017, pp. 1066–1073.
“In-network Outlier Detection in Wireless Sensor Networks,” Knowledge
[23] S. Srinivasan, A. Vasan, V. Sarangan, and A. Sivasubramaniam, “Bugs
and Information Systems, vol. 34, no. 1, pp. 23–54, Jan. 2013.
In The Freezer: Detecting Faults in Supermarket Refrigeration Systems
[9] S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and
Using Energy Signals,” in Proc. ACM International Conference on
D. Gunopulos, “Online Outlier Detection in Sensor Data Using Non-
Future Energy Systems, 2015, pp. 101–110.
parametric Models,” in Proc. International Conference on Very Large
[24] S. Derrible, “An Approach to Designing Sustainable Urban Infrastruc-
Data Bases, 2006, pp. 187–198.
ture,” MRS Energy & Sustainability, vol. 5, 2018.
[10] S. Erich, Z. Arthur, and K. Hans-Peter, Generalized Outlier Detection
[25] T. Wartzek, B. Eilebrecht, J. Lem, H. Lindner, S. Leonhardt, and
with Flexible Kernel Density Estimates. SDM, 2014, pp. 542–550.
M. Walter, “ECG on the Road: Robust and Unobtrusive Estimation of
[11] T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos, Heart Rate,” IEEE Trans. on Biomedical Engineering, vol. 58, no. 11,
pp. 3112–3120, Nov. 2011.