Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Machine Learning-based Real-Time Sensor Drift

Fault Detection using Raspberry Pi


Umer Saeed1, Sana Ullah Jan2, Young-Doo Lee3, Insoo Koo4*
Department of Electrical Engineering
University of Ulsan
Ulsan, South Korea
umarsaeed454@gmail.com1, sanaullahjan1990@gmail.com2, leeyd1004@naver.com3, iskoo@ulsan.ac.kr4
* corresponding author iskoo@ulsan.ac.kr

Abstract—From smart industries to smart cities, sensors in the complexities. To overcome these limitations, data-driven
modern world plays an important role by covering a large number approaches using machine learning techniques have been
of applications. However, sensors get faulty sometimes leading to proposed, which analyses data to develop the best models. The
serious outcomes in terms of safety, economic cost and reliability. models basically use historical data to find hidden patterns and
This paper presents an analysis and comparison of the
identify expected outcomes. As modern systems are becoming
performances achieved by machine learning techniques for real-
time drift fault detection in sensors using a low-computational complex, previous approaches are becoming difficult to
power system, i.e., Raspberry Pi. The machine learning implement. On the other hand, the data-driven models can be
algorithms under observation include artificial neural network, developed to adequately approximate real systems based on the
support vector machine, naïve Bayes classifier, k-nearest collected data [2].
neighbors and decision tree classifier. The data was acquired for The fault occurs in actuators, sensors or any other
this research from digital relative temperature/humidity sensor mechanical systems. In the past, algorithms for fault detection
(DHT22). Drift fault was injected in the normal data using in rolling elements of machines have been explored in a vast
Arduino Uno microcontroller. The statistical time-domain number of studies reporting efficient results [3], [4]-[9].
features were extracted from normal and faulty signals and pooled
However, sensors also fault frequently leading to serious
together in training data. Trained models were tested in an online
manner, where the models were used to detect drift fault in the consequences in terms of safety and operation. Therefore,
sensor output in real-time. The performance of algorithms was sensor fault detection is very important to ensure the safety and
compared using precision, recall, f1-score, and total accuracy reliability of systems. Several studies with time have discussed
parameters. The results show that support vector machine (SVM) a number of faults, which can possibly occur in sensors.
and artificial neural network (ANN) outperform among the given However, in the present study the most occurred sensor fault is
classifiers. focused, i.e., drift fault, which can be defined as follows:
Drift Fault -- The output of the sensor keeps increasing or
Keywords—Sensor fault, fault detection, drift fault, decreasing linearly from normal state [10]-[13]. An example of
classification, raspberry-pi. normal and faulty signal, i.e., drift fault appears in Figure 1.
I. INTRODUCTION
Modern technologies such as Industrial systems or wireless
sensor networks (WSNs) often consist of hundreds of sensors
that may be deployed in relatively harsh and complex
environments. Natural factors, electromagnetic interference,
and many other factors can affect the performance of the
sensors. When the sensor becomes faulty, it may completely
stop generating signals or produce incorrect signals. It can be
jumping between normal and faulty state unstably. To improve
safety, data quality, shorten response time, strengthen network
security and prolong network lifespan, many studies have
focused on sensor fault detection. A fault can be expressed as
an unusual property or behavior of a system or machine [1].
Studies have been carried out mainly since the 1980s for the
detection and diagnosis of defects in industrial facilities, i.e.,
physical-based or mathematical. These approaches were
limited to specific environments and conditions. It is difficult
to determine lots of model parameters due to system Figure 1. Normal and faulty signal sample plots.

Authorized licensed use limited to: East China Univ of Science and Tech. Downloaded on June 03,2023 at 02:20:06 UTC from IEEE Xplore. Restrictions apply.
Lately, machine learning techniques such as support vector Furthermore, the contributions of this paper are outlined as
machine (SVM) [4],[6] and neural network (NN) [5]-[7] has follows:
gain eminence in fault detection and diagnosis for rolling
elements and sensors. Techniques for bearing fault detection 1) Light-weight System: Low computational power system
and sensor fault detection are homogenous, however, the signal (raspberry-pi) used for fault detection with a DHT22
characteristics of sensor faults are different from the rolling temperature sensor. Raspberry Pi can be described as a small
elements. Hence, using similar features for both does not general-purpose single-board computer running mainly on
guarantee the same accuracy in results. Debian OS based on the Linux kernel. In the future, these small
The data required for this research is obtained from the general-purpose computers can be widely used in industries for
temperature/humidity sensor (DHT22). The signals obtained AI applications. These systems are cheap, easy to deploy,
from the sensor through Arduino Uno microcontroller would requires less space with decent computational powers.
be sent to Raspberry Pi for training. The drift fault is simulated
in the output signal from the sensor. Statistical time-domain 2) Real-Time Fault Detection: The proposed system adopted
features are extracted from the signal. Data is trained using the machine learning approach, which learns from the collected
classifiers, detailed discussed in fault detection strategy section. data and detects sensor faults. A signal from the temperature
For testing, randomly drift fault is generated using Arduino sensor is given to Raspberry Pi in an online manner. Algorithms
Uno microcontroller and is given to several classifiers on are trained using scikit-learn, which is a famous machine
Raspberry Pi in an online manner to examine the results for learning library for Python programming language. Trained
fault detection. Figure 2 shows the applied system model for classifiers in real-time are used to detect faults in the sensor.
fault detection in the present study. The experimental setup used for this research is shown in
Figure 3.

Figure 3. Real-Time fault detection system.

The rest of this paper is structured as follows: Section II


presents fault detection methodology. In Section III, simulation
results are discussed. Section IV concludes the paper and
provides future research works.

II. FAULT DETECTION METHODOLOGY


A. Data-Driven Approach
The data-driven approach has been applied in many real-
world applications to develop an accurate model. A large
number of techniques in the data-driven approach have been
applied to solve fault detection problems. Statistically based
methods and those based on artificial intelligence techniques
are different methods in the data-driven approach. Figure 4
Figure 2. The framework of the applied system for fault illustrates the approach towards fault detection, after data
detection. collection and feature extraction, intelligent detection will be
employed.

Authorized licensed use limited to: East China Univ of Science and Tech. Downloaded on June 03,2023 at 02:20:06 UTC from IEEE Xplore. Restrictions apply.
with 2-neurons is used throughout for each drift fault value to
examine the performance of the classifier in different scenarios.
Default constant learning rate with relu as an activation
function is set.
Figure 4. Steps towards fault detection.

B. Machine Learning for Classification


Classification is a supervised machine learning approach,
which can be defined as a means of categorizing some unknown
items into a discrete set of classes. In this work, the binary
classification approach is used, which distinguishes between
two classes, i.e., normal and faulty. Some of the classification
techniques used in this work are explained as follows [16]:

1) Support Vector Machine (SVM): Developed in the 1970s,


SVM deals with the concept of statistical learning theory and
in the field of machine learning, precisely for fault detection
and classification, SVM is one of the good-performance Figure 5. The workflow of Multi-Layer Perceptron.
algorithms deals essentially with two-class classification
problems [14]. Linear line or hyperplane is generated as a 3) Naïve Bayes Classifier (NB): Naive Bayes classifier is a
decision boundary for classification tasks between datasets of probabilistic machine learning model that is used for the
two classes. The nearest data points to the hyperplane, which classification task. The core of the classifier is based on the
impart construction of the hyperplane are called support vectors Bayes theorem,
[15]. The optimized hyperplane can be mathematically
expressed as |
| (3)
, (1)
where w is the vector of weights, x is an input vector, and b
There are several NB classifiers used for different tasks, i.e.,
represents the bias. The equations of the support vectors of each
Multinomial Naïve Bayes, Bernoulli Naïve Bayes, Gaussian
class are given as
Naïve Bayes. In the past, NB classifiers are mostly used in spam
filtering, recommendation systems, sentiment analysis, etc. In
1, 1, this research, Gaussian Naïve Bayes classifier is adopted to
, , (2) examine its performance with the rest of the well-known
classifiers for fault detection.
where di corresponds to the respective class, i.e., di = +1 for
class A, and di = -1 for class B. 4) K-Nearest Neighbors (KNN): Following the path of
nearest neighbor (NN), k-nearest neighbor data description is a
In this research, binary-class SVM-based classifier with linear one-class classification method, in which instead of choosing
kernel function is used to analyze the results for sensor fault only the first nearest neighbor, it needs to select k nearest
classification. The cost parameter C was set to default (C=1). neighbors. Where k is the number of nearest neighbors to an
object detected by the classifier. The distance metric used for
2) Artificial Neural Network (ANN): A class of feedforward current research is Minkowski with the power parameter (p=2).
ANN, multi-layer perceptron (MLP) consists of, at least, three Size of the number of neighbors is set 2 with default leaf size
layers of nodes: an input layer, a hidden layer, and an output 30 and uniform weights. A working sample of KNN classifier
layer. Each node is a neuron that uses a nonlinear activation is shown in Figure 6.
function except for the input nodes. For training, MLP utilizes
supervised learning technique known as backpropagation. An
example of MLP is shown in Figure 5.
The number of nodes in hidden layers of a neural network
can be decided according to the dataset. There is no specific
hard-and-fast rule to declare a specific number of layers. Too
many nodes can overfit training data and fewer nodes can lead
to high prediction error. In both cases, the classifier will not
generalize well.
An optimal number of hidden layers, as well as nodes, shall
be chosen to minimize the error. In this research, 1-hidden layer Figure 6. Sample of K-Nearest Neighbor Classification.

Authorized licensed use limited to: East China Univ of Science and Tech. Downloaded on June 03,2023 at 02:20:06 UTC from IEEE Xplore. Restrictions apply.
5) Decision Tree (DT): Decision trees are all-purpose For each considered drift fault value, data was generated of 120
machine learning algorithm. They perform well even on samples, each sample consisting of 100 data elements, first 50
multioutput tasks. Like SVMs, DTs are very powerful normal and last 50 faulty data elements, as demonstrated in
algorithms, capable of fitting complex datasets [16]. A tree-like Figure 9. Out of 120 samples, first 60 faulty and last 60 normal
structure builds both, classification and regression models. DT samples were generated.
breaks down data into smaller subsets while an associated
decision tree is incrementally developed simultaneously. The
outcome is a tree with decision nodes and leaf nodes.

Figure 9. The plot sample of normal and faulty signals.

Figure 7. Binary Decision Tree working model. As drift fault was induced in the derived temperature data by
simulation. The temperature output, for temperature T is
In this research, DT classifier with max-depth 2 is used with given by
default function Gini criterion, which measures the quality of a (4)
split. Among best and random splitter function, best is used,
which chooses the best split. Figure 7 shows a general example
of how DT works for binary classification problems. where is drift fault value. i.e., 0.01-0.05. The injection of
faulty data in the healthy sensor data sample is a common
III. SIMULATION RESULTS approach among researchers due to unavailability of online
datasets [14].
A. Data Acquisition and Feature Extraction The knowledge-based fault detection technique is adopted,
The data were acquired from the digital relative which only requires historical data for training. The received
Temperature/Humidity Sensor DHT22 developed by Adafruit data from Arduino Uno was stored on Raspberry Pi for further
Industries, available in a 4-pins package. The data were processing and simulation purposes. The model of the
obtained serially from a sensor using Arduino Uno connections between the Raspberry Pi and Arduino Uno
microcontroller through Arduino’s IDE and PLX-DAQ, which microcontroller and between the sensor and Arduino Uno is
is a parallax microcontroller data acquisition tool. The output shown in Figure 8.
of the sensor was connected to one of the Arduino Uno’s I/O The data were divided into 120 samples, each sample
pins. The experimental setup is shown in Figure 3. consisting of 100 data elements. Then, drift fault, i.e., 0.01-0.05
A serial communication link was established between the as shown in Figure 9 were simulated in the obtained data. For
Arduino Uno and the workstation. Baud rate was set to 9600bps. each considered drift we got 120 samples. The resultant dataset
Total of 10,000 normal data elements and 50,000 faulty data consisted of 5*120*100 data elements for the five drift classes.
elements were obtained at room temperature (approximately Furthermore, for feature extraction and to reduce the
24~26°C). Faulty data was generated through simulations. dimensions, max and mean features were extracted from the
Block diagram of the setup is shown in Figure 8. normal and faulty signal data and then pooled together to
generate training data (Figure 2). The mean and maximum
value is considered good to be calculated when the defect
affects the overall mean and max of the signal amplitude,
shown in Table 1.

Table 1. Time-domain features


Feature Equation

Mean (x) 1

Figure 8. Block diagram of the experimental setup. Peak ( ) max

Authorized licensed use limited to: East China Univ of Science and Tech. Downloaded on June 03,2023 at 02:20:06 UTC from IEEE Xplore. Restrictions apply.
B. Training and Testing
Classifiers were trained on raspberry-pi using machine
learning library scikit-learn for the Python programming
language [17]. For training SVM, built-in function SVC based
on the one-versus-rest manner with linear kernel function was (a)
used. The linear kernel function, , , along
transformation function with dot product is shown in
Equation (5). The data transformed into a higher dimension can
be easily separated using hyperplane function (Equation (6)),
where is support vector data, is Lagrange multiplier and
is membership class label (+1,-1) with 1,2,3, … , .

, (5)
∑ , (6)

Furthermore, for the training of the neural network, function


MLPClassifier with hidden layer size 2 was used. Naïve Bayes
(NB) was trained using GaussianNB built-in function. For
KNN, function KNeighborsClassifier with a number of
neighbors 2 was used and finally, for the Decision Tree,
function DecisionTreeClassifier with maximum depth size 2
was used. Additionally, the Python pickle module was used for
serializing and de-serializing a Python object structure. What
pickle does is that it serializes the object first before writing it
to file. Pickling is a way to convert a python object (list, arrays)
into a character stream. The idea is that this character stream (b)
contains all the information necessary to reconstruct the object
in another python script. In machine learning, pickle function Figure 10. Illustration of (a) Training and (b) Testing phase.
is used for the delivery of trained models, once the desired
parameters are achieved. Pickle’s function dump for writing a
pickled representation of obj to the open file and function load Precision, recall, f1-score, and total accuracy parameters were
for reading a pickled obj representation from the open file is taken as a result, Eq. (8), (9), (10) and (11). The total accuracy
used. Moreover, the illustration of training (a) and testing (b) of classifiers with the rest of the parameters are shown in Table
phases in the present research is shown in Figure 10. 2.
For testing, Arduino microcontroller was code to randomly The total accuracy parameters result in the fault detection
generate binary number . The temperature output, where scheme using SVM, ANN, NB, KNN, and DT are shown in
the fault was injected in normal temperature is given by Figure 11. To compare the performance among five models,
SVM and ANN outperformed rest of the classifiers. The
experimental results show that KNN and DT abruptly improved
(7) performance with the increase in fault value while naïve Bayes
classifier (NB) gradually caught up with the rest of the
classifiers.
For each considered drift fault value, pickle files were
generated and used further on for testing the performances of
classifiers on raspberry-pi. Moreover, temperature signals in
real-time were obtained serially from Arduino Uno and faults 8
were introduced using simulations for testing purposes. Normal
and faulty signals were generated randomly using the Arduino
IDE to examine the performances of algorithms. Using Python 9
script, signals were read serially on raspberry pi, with the delay
time 30ms and baud rate 9600bps. After completion of each 2
cycle, contained 100 data elements (1-Sample), max and mean 1 10
features were extracted from the sample and were given to
trained pickle module to predict data as normal or faulty.
Results of the initial 100 samples for each classifier were stored 11
on raspberry-pi to check the accuracy of models.

Authorized licensed use limited to: East China Univ of Science and Tech. Downloaded on June 03,2023 at 02:20:06 UTC from IEEE Xplore. Restrictions apply.
Table 2. Simulation Results
0.01
SVM ANN NB KNN DT
Prec Recall f1 Prec Recall f1 Prec Recall f1 Prec Recall f1 Prec Recall f1
0 100% 98% 99% 0 100% 98% 99% 0 45% 100% 62% 0 87% 100% 93% 0 51% 100% 67%
1 98% 100% 99% 1 98% 100% 99% 1 0% 0% 0% 1 100% 87% 93% 1 100% 20% 33%
Acc. 99% Acc. 99% Acc. 45% Acc. 93% Acc. 56%
0.02
SVM ANN NB KNN DT
Prec Recall f1 Prec Recall f1 Prec Recall f1 Prec Recall f1 Prec Recall f1
0 100% 100% 100% 0 100% 100% 100% 0 100% 20% 33% 0 100% 100% 100% 0 89% 18% 30%
1 100% 100% 100% 1 100% 100% 100% 1 60% 100% 75% 1 100% 100% 100% 1 60% 98% 75%
Acc. 100% Acc. 100% Acc. 64% Acc. 100% Acc. 63%
0.03
SVM ANN NB KNN DT
Prec Recall f1 Prec Recall f1 Prec Recall f1 Prec Recall f1 Prec Recall f1
0 100% 100% 100% 0 100% 100% 100% 0 100% 62% 77% 0 100% 100% 100% 0 100% 100% 100%
1 100% 100% 100% 1 100% 100% 100% 1 76% 100% 87% 1 100% 100% 100% 1 100% 100% 100%
Acc. 100% Acc. 100% Acc. 83% Acc. 100% Acc. 100%
0.04
SVM ANN NB KNN DT
Prec Recall f1 Prec Recall f1 Prec Recall f1 Prec Recall f1 Prec Recall f1
0 100% 100% 100% 0 100% 100% 100% 0 100% 69% 82% 0 100% 100% 100% 0 100% 100% 100%
1 100% 100% 100% 1 100% 100% 100% 1 80% 100% 89% 1 100% 100% 100% 1 100% 100% 100%
Acc. 100% Acc. 100% Acc. 86% Acc. 100% Acc. 100%
0.05
SVM ANN NB KNN DT
Prec Recall f1 Prec Recall f1 Prec Recall f1 Prec Recall f1 Prec Recall f1
0 100% 100% 100% 0 100% 100% 100% 0 100% 87% 93% 0 100% 100% 100% 0 100% 100% 100%
1 100% 100% 100% 1 100% 100% 100% 1 90% 100% 95% 1 100% 100% 100% 1 100% 100% 100%
Acc. 100% Acc. 100% Acc. 94% Acc. 100% Acc. 100%

Authorized licensed use limited to: East China Univ of Science and Tech. Downloaded on June 03,2023 at 02:20:06 UTC from IEEE Xplore. Restrictions apply.
[6] B. Samanta, “Gear fault detection using artificial neural networks and
support vector machines with genetic algorithms,” Mech. Syst. Signal
Process., vol. 18, no. 3, pp. 625-644, 2004.
[7] B. Sreejith, A. K. Verma, and A. Srividya, “Fault diagnosis of rolling
element bearing using time-domain features and neural networks,” in
Proc. IEEE Region 10 3rd Int. Conf. Ind. Inf. Syst., vol. 1, pp. 1-6, Sep.
2008.
[8] Y. Wang, J. Xiang, R. Markert, and M. Liang, “Spectral kurtosis for fault
detection, diagnosis and prognostics of rotating machines: A review with
applications,” Mech. Syst. Signal Process., vols. 66-67, pp. 679-698, Apr.
2016.
[9] Q. Xiao, Z. Luo, and J. Wu, “Fault detection and diagnosis of bearing
based on local wave time-frequency feature analysis,” in Proc. 11th Int.
Conf. Natural Comput. (ICNC), pp. 808-812, 2015.
[10] J. L. Yang, Y. S. Chen, L. L. Zhang, and Z. Sun, “Fault detection,
isolation, and diagnosis of self-validating multifunctional sensors,” Rev.
Sci. Instrum., vol. 87, no. 6, p. 065004, 2016.
Figure 11. Performance comparison of models. SVM and ANN [11] R. Dunia, S. J. Qin, T. F. Edgar, and T. J. Mcavoy, “Identification of
show the best performance in terms of detection accuracy. faulty sensors using principal component analysis,” Process Syst. Eng.,
vol. 42, no. 10, pp. 2797-2812, 1996.
[12] J. Kullaa, “Detection, identification, and quantification of sensor fault in
IV. CONCLUSION AND FUTURE WORKS a sensor network,” Mech. Syst. Signal Process., vol. 40, no. 1, pp. 208-
221, Sep. 2013.
[13] Y. Yu, W. Li, D. Sheng, and J. Chen, “A novel sensor fault diagnosis
In this paper, the authors identify drift fault in sensor fault method based on modified ensemble empirical mode decomposition and
detection problem. Low computational power system probabilistic neural network,” Measurement, vol. 68, pp. 328-336, May
(raspberry-pi) was proposed, which can effectively be used in 2015.
smart systems for intelligently fault detection in real-time using [14] S. U. Jan, Y.-D. Lee, J. Shin, and I. Koo, “Sensor Fault Classification
Based on Support Vector Machine and Statistical Time-Domain
AI techniques. Several machine learning classification Features,” IEEE Access, vol. 5, no. 1, pp. 8682–8690, 2017.
algorithms were used to classify data as normal and faulty. [15] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed.
Experimental results show that SVM and ANN performed Upper Saddle River, NJ, USA: Prentice-Hall, 1994.
tremendously well, even with the least features and without [16] A. Geron, Hands-On Machine Learning with Scikit-Learn &
TensorFlow. O’Reilly, 2017.
requiring a large amount of data. [17] Scikit-learn library. Available online scikit-learn org (accessed on 5 Sep
For future work, a more capable single-board computer can 2019).
be used instead of a Raspberry Pi, which can handle more
complex operations, and various sensors, such as accelerometer
or pressure sensor can be used instead of temperature sensor for
different kinds of other sensor faults. Also, fault diagnosis and
prognosis can be done following the data-driven approach.

V. ACKNOWLEDGMENT

This work was supported by the Business for Cooperative


R&D between Industry, Academy, and Research Institute
through the Korea Small and Medium Business Administration
in 2016 under Grant C0398156.

REFERENCES
[1] Z. Gao, C. Cecati, and S. X. Ding, “A survey of fault diagnosis and fault-
tolerant techniques-Part I: Fault diagnosis with model-based and signal-
based approaches,” IEEE Trans. Ind. Electron., vol. 62, no. 6, pp. 3757-
3767, Jun. 2015.
[2] D. Park, S. Kim, Y. An and J. Jung, “LiReD: A Light-Weight Real-Time
Fault Detection System for Edge Computing Using LSTM Recurrent
Neural Networks,” MDPI Sensors., 18,2110; DOI: 10.3390/s18072110,
Jun. 2018.
[3] J. Tian, C. Morillo, M. H. Azarian, and M. Pecht, “Motor bearing fault
detection using spectral kurtosis-based feature extraction coupled with
K-nearest neighbor distance analysis,” IEEE Trans. Ind. Electron., vol.
63, no. 3, pp. 1793-1803, Apr. 2016.
[4] T. W. Rauber, F. De A. Boldt, and F. M. Varejáo, “Heterogeneous
feature models and feature selection applied to bearing fault diagnosis,”
IEEE Trans. Ind. Electron., vol. 62, no. 1, pp. 637-646, Sep. 2015.
[5] O. Castro, C. Sisamón, and J. Prada, “Bearing fault diagnosis based on
neural network classification and wavelet transform,” in Proc. 6th
WSEAS Int. Conf. Wavelet Anal. Multirate Syst., Bucharest, Romania,
pp. 22-29, Oct. 2006.

Authorized licensed use limited to: East China Univ of Science and Tech. Downloaded on June 03,2023 at 02:20:06 UTC from IEEE Xplore. Restrictions apply.

You might also like