From Cloud Down To Things An Overview of Machine Learning in Internet

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2893866, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL (IOT-J) 1
From Cloud Down to Things: An Overview of

Machine Learning in Internet of Things
Farzad Samie, Lars Bauer, and Jörg Henkel, Fellow, IEEE
Abstract—With the numerous IoT devices, the cloud-centric
User
Connected
User
User
data processing fails to meet the requirement of all IoT appli-
Clouds
device
cations. The limited computation and communication capacity
of the cloud necessitate the Edge Computing, i.e., starting the
IoT data processing at the edge and transforming the connected
devices to intelligent devices. Machine learning, the key means
for information inference, should extend to the cloud-to-things
Fogs
< Decision Making >

continuum too.
This article reviews the role of machine learning in IoT from
the cloud down to embedded devices. Different usages of machine
Gateways
learning for application data processing and management tasks
are studied. The state-of-the-art usages of machine learning in
IoT are categorized according to their application domain, input
data type, exploited machine learning techniques, and where they
belong in the cloud-to-things continuum. The challenges and
research trends toward efficient machine learning on the IoT
Devices
edge are discussed. Moreover, the publications on the ‘machine
IoT
learning in IoT’ are retrieved and analyzed systematically using

machine learning classification techniques. Then, the growing Smart
topics and application domains are identified. device
Index Terms—Internet of Things, IoT, Embedded Systems, Fig. 1: Hierarchical architecture of IoT processing layers
Machine Learning, Edge Computing, Embedded Intelligence.
In the traditional cloud computing paradigm, the whole
I. I NTRODUCTION processing is performed on the cloud, making the IoT devices
remote & connected sensors/actuators. However, the number
W ITH the rapid advancement in different technologies
such as semiconductor, wireless and sensor, we have
witnessed the interconnected devices permeating our daily
of IoT devices is expanding rapidly and the massive amount
of collected data is hard to manage by central clouds. The
reasons are the huge workload on the IoT network and the
lives. These embedded devices with sensors and communica-
unreliable and long latency [2], [1]. The edge computing
tion means are the key components of the Internet of Things
paradigm suggests pushing the data processing to the edge
(IoT) [1], [2]. IoT offers advanced control and monitoring
of IoT (comprising gateways and embedded devices) close to
services in order to provide new applications or to improve
where the data is collected [4], [5]. In many IoT applications,
the efficiency of existing ones [1].
the computation can be distributed on different layers. For
The embedded IoT devices interact with the physical world
instance, the IoT device may perform the pre-processing on
using sensors and/or actuators. Each IoT device belongs to
the data and transmits the intermediate results to the fog where
a wide continuum from connected devices to smart devices
the rest of the processing is performed [6].
depending on where its data inference and decision making
take place. As shown in Figure 1, the hierarchical architecture Machine learning (ML) is a key enabling means for IoT
of IoT consists of several processing layers [3]. The collected which provides information inference, data processing and
data can be acted upon at any of these layers or even by the intelligence for IoT devices. From Big-Data processing on
user. The connected devices have no intelligence to process the the cloud to the embedded intelligence, ML is an effective
input data and make decisions, instead, they rely on the user promising solution in different IoT application domains. Sev-
or other processing entities such as cloud to make the decision eral surveys on ML and IoT were presented recently each of
and control them remotely. On the contrary, smart devices are which covers a different aspect of ML in IoT as summarized
able to process the sensed data, assess the situation and act in Table I.
independently. For instance, using a smartphone application In this article, we review the usage of ML in different IoT
to turn on the heating system remotely depends on the user’s domains, the efforts to bring the intelligence to the edge, appli-
decision while a smart thermostat can autonomously tune the cations of ML in both data processing and management tasks
heating according to the home’s occupancy or user comfort. of IoT, and new trends, research directions and challenges.
We also analyze all the publications on ML in IoT using a
F. Samie, L. Bauer and J. Henkel are with Chair for Embedded Systems,
Karlsruhe Institute for Technology. systematic approach based on machine learning classification
Manuscript received January 11, 2019; revised X. techniques.
2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2893866, IEEE Internet of
Things Journal
TABLE I: Existing surveys on IoT and ML

predicted output can be evaluated by a positive or negative
Ref. Brief description
reward which indicates how good or bad the predicted
[7] context aware computing, ambient intelligence, using output was.
sensor data to understand the changes in the environ-
ment, and respond accordingly, big data analytics.
The usage of ML can be divided into four main categories:
[8] basic concepts, introduction to deep learning
methodologies A. Classification
[9], [10] reliable, secure, energy efficient and scalable ma- The output of classification belongs to a finite set of pre-
chine learning architectures for edge devices defined discrete values or classes. Depending on the number
[11] deep learning algorithms, IoT analytics for big data of classes, the classification problems fall into one of these
and streaming data
two categories: i) binary classification: it involves with only
[12] hardware and algorithmic techniques for embedded
deep learning two labels such as False/True or 0/1, and ii) multi-class
[13] hardware design, architecture, hardware-friendly al- classification: it involves with more than two classes. For
gorithms, mixed-signal circuits, and advanced tech- instance, monitoring the data traffic of an IoT device and
nologies for machine learning in image classification detecting whether it is participating in a DDoS attack or
The main contributions of this article are as follows: not is a binary classification. But human activity recognition
by wearable devices (to detect ‘walking’, ‘running’, ‘sitting’,
• We investigate the state-of-the-art on ML in IoT, catego-
‘standing’, etc.) is a multi-class classification.
rize them according to their application domain, where
There are several classification models which include –
they stand in the cloud-to-things continuum, and what
but not limited to– Logistic Regression (LoR), Support Vec-
they infer from data.
tor Machine (SVM), Decision Tree (DT), Naive Bayes, k-
• We present a comprehensive view on different ML tech-
Nearest Neighbors (KNN), artificial neural networks (ANN)
niques, the characteristics, the role of different processing
and Linear Discriminant Analysis (LDA). Note that LoR –
layers from cloud to fog down to embedded devices in
despite its name– is a classification and not regression model.
inference, suitability of ML techniques for each layer
The internal operations of some classification models are
and opportunities to integrate IoT devices with embedded
schematically described in Figure 2.
intelligence.
Deep neural networks (DNN), unlike the ANN, do not
• Using a machine learning approach, we analyze the
rely on the selected features by the designer. Instead, they
publications on ML in IoT by training accurate classifiers
learn the features automatically and directly from the data.
which label the publications based on the key phrases
To do so, DNN usually requires more hidden layers in order
in their title and abstract. The presented details allow
to hierarchically detect the features from different layers of
other researchers to reproduce the analysis with updated
abstraction. The higher number of hidden layers make these
publications at a later time.
networks ‘deep’. Convolutional neural networks (CNN) are a
• Based on a thorough analysis of both IoT and machine
class of deep neural networks which are the main classification
learning literature, we present challenges and potential
model for image classification and computer vision. CNN
research trends on efficient ML for IoT with a focus on
preserves the spatial structure of subfigures in a hierarchical
edge computing paradigm.
fashion and uses convolution filters to detect feature maps. In
The remainder of this article is organized as follows: Sec- recurrent neural networks (RNN), the output of some neurons
tion II presents relevant background on ML techniques and can flow back and feed the inputs for the neurons of the
their applications. Section III reviews the application of ML in same layer or previous layers (see Figure 3). These feedback
different IoT domains, categorize their role and identify their loops in the network structure help the RNN to persist time
place in the cloud-to-edge continuum. In Section IV, the open series information and temporal patterns. RNNs perform very
challenges and research opportunities for efficient ML in IoT well for speech recognition, video classification and other
are discussed. Then in Section V, we quantitatively analyze applications whose inputs or outputs are sequences.
the publications on ML and IoT and use a machine learning Ensemble models consist of several complementary classi-
approach to classify and label the publications. Finally, Sec- fiers that might even be weak individually but can be combined
tion VI concludes the article. to form a strong and accurate classifier (see Figure 2). For
instance, Random Forest (RF) ensembles several DTs each
II. BACKGROUND trained with different independent subsets of data. Boosted
ML aims at creating models based on observations and trees (e.g. gradient boosting trees) also ensembles trees but
experiences, and then use the model to predict future data or the trees are not trained independently in parallel. The new
discover patterns. Depending on the type of data to create the tree is trained in a way that mispredicted inputs of the last
model, ML techniques can be divided into three categories: tree are prioritized. Ensemble models are more widely used
• supervised learning uses training datasets which are for classification but can be applied for regression too.
labeled with correct output, Some classification models such as RF, ANN or kNN are
• unsupervised learning uses a collection of unlabeled intrinsically able to deal with multiple classes. Some models
data to find underlying patterns, such as SVM and LoR shall use multiple binary classifications
• in reinforcement learning, the correct output is not in order to handle a multi-class problem. Two main approaches
available a priori but in a trial-and-error fashion, the are:
Things Journal
LoR DT SVM
+
+++++ + + -
++ +
++ - -- -- + - -- + - --
++ ++ - - - - kernel + + - -- - - -
+ - -- T1 - + --- φ ++ --
- -- - -- -- ++ - + -
- -- -- + + + ++ - SV+ + +
T2
#features T
F1≤ T1 F1>T1
features
Feature values
1
weights
F2>T2
#SVs
∑ Thr. Class 1 F2≤ T2 F2≤ T2 F2>T2 * * -bias
...
Class 0
...
0 0
Linear
model Log. transform class 0 class 1 class 1 class 0
ANN Ensemble trees

Computation=LoR + - + - + - + -
Input output ++++ + + - ++++ + + - ++++ ++ - ++++ ++ -
layer layer + - -- + - -- + - -- + - --
- - - -
+ - - -- - + - - -- - + - - -- - + - - -- -
Class 0
Weak Weak Weak Strong
Class 1 classifier 1 classifier 2 classifier 3 classifier
...
...
...
Ensemble
Class N trees
Hidden layers
Fig. 2: The internal operations of some classification models
RNN Pre-processing Classification
Inputs
Input Hidden layers output denoising, filter, ... SVM, DT, LoR,
layer layer ANN, KNN, NB,
Feature Extraction Boosted trees, ...
Convolution,
Rectification,
...
CNN
...
...
Max-pooling, Post-processing
Normalization
Kalman filter,
FFT, DCT, voting, …
(Bio)signals DWT, …
Recurrent Output
...
layers
Fig. 4: Different stages of classification task at runtime
Fig. 3: Recurrent Neural Network architecture regression belongs to a continuous range. For instance, in a
1) one-versus-all (OvA): In this approach, one binary clas- smart grid application, a regression model can predict energy
sifier is used to separate each label from all other labels. usage based on past information. Linear Regression (LiR) is
Therefore, this approach requires N classifier to handle the most widely used model for regression which P formulates
an N -class problem. a linear equation of input variables (e.g. p = fi · wi where
2) one-versus-one (OvO): In this approach, one binary clas- fi is the value of i-th feature, wi is the weight of that feature
sifier is used to separate each pair of classes. Hence, it and p is the predicted value). Some of the models such as DT
results in N ×(N −1)/2 classifiers in an N -class problem. and KNN can also be used for regression even though their
The typical pipeline for machine learning classification common usage is for classification. Polynomial regression is
includes multiple stages as shown in Figure 4: pre-processing a similar model with the difference that the power of features
(i.e. filtering, denoising, etc.), feature extraction (e.g. Fast (i.e. fi ) can be more than 1. This model has the risk of over-
Fourier Transform, Discrete Wavelet Transform, Discrete Co- fitting which makes the model perform very good for the test
sine Transform, Max Pooling, Rectification, Convolution, Nor- data but it is generalized for the new data.
malization, etc.), classification, and finally post-processing to Linear regression model may suffer from multicollinearity
smooth out the output (e.g. Kalman filter, majority voting, (i.e. high correlation between features). It causes high vari-
etc.). Each stage contains different computation operations ance in the value of features’ weight when performing the
each of which could benefit from efficient hardware imple- training phase multiple times. To avoid this issue, two similar
mentation or software optimization. regression models are typically used, e.g. Ridge regression
and Least Absolute Shrinkage Selector Operator (LASSO)
B. Regression regression. While these models also use linear predictors, they
Regression models the relationship between input variables constrain the value of features’ weights in the training phase.
and output to provide a predictive model. The output of Consequently, the large weights and variance between them
Things Journal
are avoided. An extra advantage of LASSO is that it performs D. Reinforcement Decision Making
the feature selection by removing the unimportant features as Reinforcement learning is mainly used for decision making.
it makes the small weights to become 0. Hence, the difference The outcome of each decision will be considered as the
between Ridge, LASSO, and linear regression is the weights reward or penalty to evaluate the decision and update the
of the features in their linear model. When the number of decision function. Reinforcement learning does not require
features is very large, LASSO regression is more suitable. offline training and hence is suitable for IoT systems with
no prior knowledge about the relation between their input and
C. Clustering correct output (e.g. decision). The learner (known as Agent)
Clustering is an unsupervised learning method that extracts learns from experiences and trial-and-error. It takes actions,
the structure and hidden patterns from data. It clusters the observes the reward of the action plus the change in the state
data into few groups which have similar features and common of the environment, and then improves its decision making
structure and the data points in different clusters are dissimilar. accordingly (see Figure 6).
The main applications of clustering include marketing (e.g. Agent
Action
discover customer segments to target lookalike customers with Actions
b le
ta
similar advertisements), recommender systems (e.g. retail, Environment Agent 1 2 ... M
Q-
music, video-on-demand), outliers detection (e.g. detecting reward 1
States
lu rd
2
va wa
fraudulent patterns in insurance, smart grid and etc.), content observe state
es
...
Re
management (e.g. clustering the articles, books and other N
contents), etc.
K-Means is one of the most popular models for this task Reinforcement learning Example: Q-learning
which clusters the input data into k clusters. Fig. 6: Components of reinforcement learning and their interac-
Given the number of clusters, k, the k-means algorithm tions
follows these steps: The reinforcement learning problems can be formalized by
1) Select k centroids (e.g. randomly) for the clusters. the Markov Decision Process which is described by:
2) Find the closest centroid for each data point and assign • a finite set of states,
the data to this cluster. The metric to find the closest • a finite set of possible actions in each state,
centroid is often the Euclidean distance. • the transitions between states,
3) Update the centroids by calculating the average position • the rewards associated with each transition.
of data points in the clusters. The objective of the reinforcement learner is to maximize the
4) Repeat Steps 2 and 3 until the position of centroids sum of rewards in the long run:
converge.
t→∞
The complexity of k-Mean clustering is O(n), but the disad- X
γ t × r s(t), a(t)

(1)
vantage is that it requires prior knowledge of k. t=0
Hierarchical clustering is another technique which has more
flexibility for the number of created cluster. The hierarchical where r s(t), a(t) denotes the reward associated with taking
clustering follows several steps: action a at state s. The γ is a discount factor between 0
and 1 which determine the importance of future rewards. To
1) Start with N clusters (e.g. one for each data point if it is
solve this Markov Decision Process problem, we need to find
small).
a policy to take the best action at each state such that the
2) Merge the two closest clusters.
accumulated reward is maximized [14].
3) Update the calculated distance between clusters.
Q-Learning is the most popular reinforcement learning
4) Repeat Steps 2 and 3 until there is only one cluster.
algorithm as it is easy to implement and proven to converge
Figures 5a and 5b show examples of k-means and hierarchical to the optimal solution (see [15] for more details). It creates
clustering, respectively. an action-value function (called Q-function) which estimates
the expected reward of taking a certain action while being in
a certain state. The initial value of Q-function of all inputs
is a fixed arbitrary value. At each timestamp, the function is
updated according to the taken action and the received reward.
Q(st , at ) ← Q(st , at ) + |{z} α ×
| {z }
old value learning rate

rt + γ · max Q(st+1 , a) − Q(st , at ) (2)
|{z} |{z} a | {z }
reward discount
| {z } old value
factor estimate of optimal future value
(a) (b) | {z }
learned value
Fig. 5: Examples of (a) k-means clustering with k=3, (b) hierar- where α –the learning rate– indicates the weight of learned
chical clustering
value. When setting α = 1, the old value is completely replaced
Things Journal
TABLE II: The characteristics of some ML techniques

activity recognition, monitoring lifestyle, assist patients to
pros. The designer does not require deep understanding of cope with daily difficulties, etc.
Clustering
the input data and their features The heart electrical activity signal or Electrocardiogram
not embedded-friendly, computation intensive, mem- (ECG) can be used to detect stress, cardiovascular diseases,
cons. ory intensive, requires very large amount of data to
perform well etc. These applications train supervised classifiers such as
low computation and low memory requirement, hence Bayesian Network, SVM, Logistic Regression, etc. Azariadi
Reinforce-
pros. it is embedded-friendly. It can be deployed with no et al. [18] use SVM classifier to detect heartbeat abnormality
ment
prior knowledge about input data from ECG signals on the IoT device. In [19], cardiovascular
cons. learns from runtime experiences, hence may make diseases are predicted from the ECG signal using classification
inefficient decisions
techniques. The ECG signal is captured from wearable devices
The accuracy can be further increased even after
Ensemble
Boosted /
pros. and transmitted to a remote processing unit. The classifier

Trees
deployment. Traversing trees can be parallelized.

Large memory requirement to keep the trees data is based on the Bayesian Network model with four classes:
cons.
structure Normal, Premature Atrial Contraction (PAC), Premature Ven-
tricular Contraction (PVC) and Myocardial Infarction (MI).
with the learned value. The updating Q-function is basically
In [20], the level of driver’s stress is detected from his/her
an exponential moving average of learned values. Note that
ECG signal. First, several features are extracted from the ECG
Q-learning considers the achievable reward for subsequent
including heartbeat, heartbeat variation, etc. Then, several
actions, too. The Q-function is usually implemented as a table
classifiers have been used to detect three levels of stress,
(called Q-table) whose size is bounded by the number states
among which Naive Bayes achieves the highest accuracy.
multiplied by the maximum number of actions for a state (see
Another application of classification in stress detection is
Figure 6).
presented in [21]. The heartbeat rate is considered as the main
The reinforcement learning, unlike other machine learning
feature and two classification models (LoR and SVM) are
models, faces a trade-off between exploitation and exploration
evaluated. The design is intended for IoT devices.
at runtime: whether to exploit the actions that have been
already tried and shown to be effective, or to explore new The brain electrical activity signal or Electroencephalo-
actions and discover more effective ones. For instance, in Q- gram (EEG) can be used in several applications for health
learning, we can either choose the action with the highest or well-being purposes. In [22], a light-weight supervised
expected reward or take a random action some percentage classification model is proposed to predict epileptic seizures
of the time. Some techniques use the ‘-greedy’ method for by classifying the EEG input data using Logistic Regression
exploration which implies that with the probability of they and Gradient Boosting Trees. Several features form time-
choose a random action with uniform distribution instead of domain and frequency-domain of EEG signal are used for
exploiting the so-far best-known action. classification. The LoR classifier, which is low-overhead, is
Characteristics of Machine Learning Techniques: implemented on the IoT devices, while the XGB classifier is
IoT devices at the edge are usually resource-constrained (i.e. offloaded to the gateway. In [23], the drowsiness level of the
limited energy budget, memory, and computation capability driver is classified on a wearable IoT devices which uses EEG
[16]). On the other hand, the resource usage of machine signal. The supervised classifier is based on SVM.
learning techniques highly depends on the application. How- In [24], the k-means clustering is used to analyze the
ever, some techniques are inherently resource-hungry or light- physiological data and discover the patterns in the collected
weight. Lane et al. [17] analyzed the deep learning algorithm data from the wearable health devices. The algorithms are
on wearable IoT devices to process audio and image data. implemented on low resource fog nodes and gateways, close
They investigate the performance, resource usage and the to the data source.
execution characteristics of the deep neural network (DNN) 2) Smart Home: A self-learning home management system
and convolutional neural network (CNN) models. is presented in [25] which uses different machine learning
Table II summarizes some of the characteristics of ML techniques for price clustering and price prediction. Detection
techniques. These properties make some ML techniques more of human presence, activity recognition, self-organizing home
suitable for some processing layers and less suitable for others. appliances, controlling air-conditioning based on user comfort,
etc. are enabled by machine learning. Particularly, the activity
recognition has received considerable attention [26] due to its
III. M ACHINE L EARNING IN I OT
wide application for assisted living, home automation, etc.
A. Machine Learning in Data Processing For an activity recognition application, [27] uses embedded
Machine learning plays an important role in providing classification to process the accelerometer data on the IoT
the application service to the user. Most of IoT applications device instead of transmitting raw data to the cloud server.
require to classify the input data as the output of their service. They consider six activities and use an SVM model for
Many application domains benefit from machine learning classification. To deal with the multi-class problem, they use
advantages as discussed in the following. an OvA approach. The activity recognition application may
1) Healthcare and Well-being: Wearable health monitoring fall into assisted living and healthcare domains as well.
devices are able to capture biomedical signals from the user’s Idowu et al. [28] use multiple regression models to forecast
body. The collected data can be used to detect early diseases, the heat load for a multi-family smart building. This forecast
Things Journal
can help to optimize the energy consumption of building in 1) unbalanced data as the number of failures in the dataset
the smart grid era. is low, and 2) to predict and prevent the failure, we need to
3) Smart Agriculture: IoT solutions have been proposed to detect before it happens when the observations show no fault.
improve productivity and reduce the cost of maintenance in The proposed solution in [35], is to use multiple classifiers
agricultural systems and farming. for each of which the training dataset is manipulated as
In [29], the image classification based on convolutional follows: a specific number of observations before the actual
neural network is used to detect pests and diseases of plants in failure are labeled as ‘failure’. Increasing this number results
a farm. In a similar application, Ref. [30] uses linear Support in a more conservative maintenance recommender system.
Vector classifier, kNN and ExtaTree models to detect the In [36], unsupervised clustering is proposed for discovering
crop diseases from plant images. The images can be taken degradation in cyber-physical systems (CPS). Afterward, the
with a smartphone and then uploaded to a server where the failures and root-cause can be predicted and reasoned.
classification takes place. The application involves a multi- 5) Smart Grid: Using the connected meters, smart grid
class classification where four diseases and the health status aims to improve the efficiency of the power grid. The main
will be detected. concepts in the smart grid domain (dynamic pricing, integra-
Ref. [31] monitors the environmental parameters for a grape tion of renewable energies, etc.) are shown to benefit from
plant including temperature, humidity, leaf wetness, etc. The different machine learning techniques including classification,
captured data is transmitted to a server where an application regression, reinforcement learning, etc.
identifies the chances of grape diseases in its early stages and Wei et al. [37] present a Q-Learning method to manage the
notifies the farmer. In [32], CNN is used to detect the flowers battery charge/discharge from renewable sources to minimize
and seedpods from the images taken and sent from the plant the cost of power from the grid. O’Neill et al. [38] take
in the farm to monitor the growth rate and environmental advantage of Q-learning to estimate the future energy price
parameters. in smart grid (with energy real-time price), and then schedule
In summary, the aims of these solutions can be categorized the home appliances to minimize the energy cost. Thapa et
into two main goals: 1) detection of diseases and pests [29], al. [39] use reinforcement learning principle to schedule the
[31], [30], and 2) maintaining the required environment of loads in the smart grid where the consumers must decide
plants such as temperature, moisture, soil’s condition, etc. [32]. whether to request for power supply in the current time
Machine learning techniques have been utilized to assess the slot or not. Sharma et al. [40] study the correlation between
environment or plant using either surveillance means such as weather parameters and solar intensity. Then, they use machine
image and video [29], [30] or environmental sensors such as learning regression models to predict the solar intensity based
temperature, humidity, etc. [31]. on the weather forecasts parameters. Chen et al. [41] exploit
One of the drawbacks in these existing solutions is the multiple regression models based on the ensemble of extreme
dependency on the cloud servers. The complexity of data gradient boosting trees (XGboost) and Ridge regression to
processing in these applications calls for efficient and powerful predict the annual energy consumption of households.
hardware to enable ML on embedded IoT devices. In [42], four regression models based on ANN, SVM, RF,
4) Smart Industry: Smart industry can benefit from its and XGBoost are evaluated for the electric load forecasting in
available data and ML techniques to improve its maintenance the smart grid. Results show that the time scale for forecasting
management, quality assurance, scheduling, etc. Mangal et al. affects the accuracy.
[33] used machine learning classification in a Bosch assembly Moreover, energy fraud detection –one of the issues in the
line to predict failure in the components and therefore improve conventional grid– can benefit from machine learning in the
the production yield. One challenge is that their training smart grid realm. For instance, Ford et al. [43] use machine
dataset contains more than 4000 features. They divide the learning to model the consumption behavior of customers and
dataset into two subsets and trained a model for each. Then then detect the abnormal behaviors with fraudulent activities.
each model is used to calculate the prediction probability for In some application domains such as smart grid, multiple
the other subset. This value is added as a new feature in the similar learners may exist in the IoT network in the same
final classification model which is based on extreme gradient vicinity (e.g. each consumer may predict the real-time price).
boosting trees (XGBoost). This opportunity enables collaborative and distributed learning
The application of machine learning in forecasting the which can improve the accuracy and reduce the overhead (see
supply chain demand is studied in [34]. ANN, RNN and SVM Section IV).
and LiR models are investigated in this study. Note that the
problem is to forecast the demand (a continuous variable) and
hence it is a regression problem. As mentioned before, the B. Machine Learning in Management & Organization
SVM and neural network models can be used for regression Besides the IoT services, machine learning can be used in
too. management and organizing IoT systems/devices. The man-
In [35], ML classifiers are used to predict the maintenance agement tasks such as resource allocation, power management,
issues in the manufacturing process and hence reduce the scheduling, security state assessment, etc. can benefit from
downtime and associated costs. They used SVM and kNN efficient machine learning approaches.
models to detect the failures that are caused by accumulative 1) Security & Privacy: Due to the pervasive presence of
wear and tear effects. Two challenges in this problem are: IoT devices in people’s lives, security and privacy are the
Things Journal
major concerns of users. To improve the privacy of user or [59] propose a reinforcement learning technique in which IoT
assure the security state, ML techniques can come into play. device –considering its battery level– determines ‘how much’
The efforts can be divided into two categories: 1) detect, and 2) to offload its computation.
prevent. Detecting device tampering, data injection, malware, Demirel et al. [60] develop a deep reinforcement learning
abnormal behavior [44] as well as intrusion detection [45] are method to schedule the sensor-to-controller communication
among the first category. in the networked control systems where there are multiple
Wei et al. [46] use deep learning to detect false data injecting independent controlled subsystems that are connected through
in the smart grid. It exploits the hidden correlation between a shared network.
data readings –due to the physical coherence– and evaluates An unsupervised clustering method is suggested for group-
the trustworthiness of data. Sharmeen et al. [47] consider ing low power fog nodes in a 5G network [61]. Each group
the use of mobile devices in industrial IoT and analyze then is assigned to a high power fog node in order to reduce
malware detection aspects on these mobile devices. They ana- the latency of communication.
lyze datasets, feature extraction techniques, and classification The state-of-the-art usage of ML in IoT can be categorized
models with regard to their benefits and limitations. based on different aspects. Each of these aspects is impor-
Sun et al. [48] consider machine learning to model the attack tant from the design exploration perspective. These aspects
activities of botnets in IoT. They identify the activity pattern include the ML techniques and models, application domain,
and the structure of botnets by clustering the attackers. the processing layer, the type of input data to the model,
Other research works attempt to prevent security/privacy etc. We summarize these categorizations in two tables as
vulnerabilities. To improve the privacy of users, Ref. [49] follows: Table III summarizes the state-of-the-art using ML in
suggests not to transmit the raw data to the cloud. Instead, IoT, categorized according to the machine learning techniques,
only partially-processed feature data that are needed for NN where they belong in the cloud-to-edge continuum, and what
are sent. Chauhan et al. [50] use recurrent neural networks to they use ML for. The reinforcement learning has been mainly
authenticate users based on breathing acoustics on medium- used for the management on the constrained devices. In
range IoT devices (e.g. smartphone or single-board comput- addition, the clustering technique is mainly used on the cloud
ers). The goal is to detect if the sample belongs to predefined server. Classification, on the other hand, has been used on the
registered users. cloud, edge devices and distributed between multiple layers.
2) Power/Battery Management: IoT embedded devices Table IV summarizes the state-of-the-art using ML in IoT,
have various parameters that can be adapted at runtime to categorized by the exploited ML technique and the type
improve their energy efficiency. Their sensing task, processor, of inputs. The type of inputs is divided into 5 categories:
processing task, and communication task can benefit from ML 1) Multimedia including image, video, speech and acoustic
techniques to determine the efficient operating parameters. In data, 2) (Bio-)signal including ECG, EEG, and other signals
[51], the input data sampling rate is adapted at runtime based captured from body, 3) Env. Parameters including temperature,
on reinforcement learning to save energy on sensor nodes. The humidity, etc., 4) Power Grid information, and 5) Other. The
algorithm is based on Q-learning and can be implemented on type of input data usually affects the pre-processing stage as
either cloud, gateway or IoT device. Golanbari et al. [52] use well as the feature extraction stage (see Figure 4). For instance,
a regression model to find the optimal power supply voltage when the input data is a signal (e.g. ECG, acoustic), the appli-
for chips to improve the energy efficiency of low-power IoT cation involves filtering or transformations (e.g. FFT). These
devices at runtime. Chafii et al. [53] use reinforcement learning operations perform more efficiently on the hardware than
to enable a dynamic spectrum access procedure in Narrow software. Hence, having co-processors and accelerators on the
Band IoT (NB-IoT) wireless modules which enhances battery IoT edge devices improves the energy efficiency significantly.
lifetime of devices. Ref. [54] evaluated ML techniques for Moreover, when the input data is multimedia, transmitting the
predicting the solar energy for energy-harvester low-power IoT whole data to the cloud server is inefficient due to its overhead
nodes based on public weather forecast. on the transmission energy of the IoT device and its load on
3) Resource Allocation/Task Scheduling: Resource man- the IoT network. It requires to enable the image classification
agement/allocation problems are generally formulated as con- and recognition application on the edge devices. In addition
strained optimization problems. Sometimes these problems –and especially for multimedia data–, approximate computing
are intractable or have unscalable solutions. Even though ML paradigm offers opportunities to improve the efficiency of ML
techniques cannot guarantee the exact (optimal) solution, they techniques in terms of reducing the computational complexity,
are low-overhead and scalable solutions. saving power consumption and improving the performance.
Cui et al. [55] present a framework that decides whether
IV. N EW T RENDS & O PEN C HALLENGES
to process the data on the gateway, at the edge, or on the
cloud server. He et al. [56] use a Deep Reinforcement Learning A. Toward Machine Learning at IoT Edge
(DRL) method for cache allocation on content-centric comput- IoT systems face an increasing need for data processing and
ing nodes and determining transmission rate between nodes. decision making at the IoT edge which requires pushing the
Iranfar et al. [57] use ML for allocating hardware accelerators computation, including machine learning algorithms, close to
and processing cores to the video streams on an IoT edge the embedded devices. However, the IoT devices at the edge
device. In [58] a reinforcement learning algorithm is presented are resource-constrained (i.e. limited energy budget, computa-
for the scheduling problem in distributed systems. Min et al. tion capability, and memory) [16]. This challenge calls for 1)
Things Journal
TABLE III: State of the art usage of ML in IoT for management & organization (Mng.) and application data processing (ADP)
Cloud Things Distributed
[46]§ detects the false data injection in smart grid using

deep learning,
[44]§ detects abnormal behavior of IoT devices (when
some data are modified), [47]§ ? malware detection in industrial - Mng.
mobile IoT devices
[45]§ detects intrusion in IoT network layer using
Classification
expanded version of LoR,

[43]† uses ANN to detect fraud in smart grid.
[22]‡ uses two classifiers (one on IoT device
and one on the gateway) to predict epileptic
[18]‡ detects abnormally in heart from seizure from EEG signals,
[33]? uses gradient boosting tree classifier to detect ECG signal using SVM,
[49] performs some layers of ANN on IoT
the failure in the products in assembly line, [23]‡ detects driver’s drowsiness from EEG device and offload the rest to cloud,
[29]∗ uses deep CNN to detect pests and diseases of signal using SVM,
[62] a framework to decide which parts of ADP
plants from images, [50]§ uses RNN to authenticate users based DNN must be offloaded to the cloud,
[32]∗ uses CNN to detect and monitor the flowers and on breathing acoustics,
[63] distributes CNN layers between cloud
seedpods from images. [27]‡ ¶ uses SVM to recognize the human and IoT devices,
daily activities.
[64] distributes CNN layers between cloud
and edge nodes.
[52] selects the optimal supply voltage for

[40]¶ † uses SVM and linear regression to predict IoT chips at runtime,
solar intensity based on weather forecast for smart [54] predicts the solar energy for energy- - Mng.
homes harvester IoT devices based on public
Regression
weather forecast.
[41]† uses ensemble to predict the annual energy
consumption of households,
[28]† + uses regression models to forecast heat load in
multi-family buildings to optimize energy consumption
in smart grid, - - ADP
[34]? forecasts the demand in supply chain of smart
factory using SVM, ANN, RNN and linear regression,
[42]† uses SVM, ANN, RF and XGBoost to forecast
the load in smart grid
[59] manages computation offloading for

IoT devices based on their battery,
[51] adapts input sampling rate on IoT
[37]¶ † uses Q-Learning to manage the battery device at runtime,
charge/discharge from renewable sources in smart grid, [57] allocates hardware accelerators and
[56] uses deep reinforcement learning for
[38]¶ † uses Q-learning to estimate the price of power cores to the video streams, cache allocation and transmission rate con- Mng.
in smart grid and then schedule the home appliances, [60] schedules the sensor-to-controller trol between IoT computing nodes (e.g. fog
nodes).
Reinforcement
[65] uses cooperative reinforcement learning for power communication,

allocation in device-to-device communication. [53] schedules wireless spectrum access in
NB-IoT to enhances battery lifetime,
[58] schedule tasks in heterogeneous dis-
tributed systems.
[66]+ uses deep reinforcement learning for localiza- - - ADP
tion in smart buildings.
[25]¶ self-learning home management system for price

clustering and price prediction of power,
[48]§ clusters the botnet attack activities, - - Mng.
Clustering
[61] clusters the fog nodes in 5G network to reduce

the latency.
- [24]‡ uses k-means clustering for physiolog- ADP

[36]? uses clustering to discover degradation in CPS.
ical data on fog and gateways.
¶ Smart Home, + Smart Building, † Smart Grid, ‡ Healthcare, ? Industry, § Security, ∗ Agriculture
Mng.: Management & Organization, ADP: Application Data Processing
Things Journal
TABLE IV: The machine learning techniques exploited in the state-of-the-art categorized by their type of inputs
type of DNN/
SVM DT LoR NB kNN Ensemble ANN Reinforcement LiR k-Means
inputs trees CNN/RNN
Multimedia [30] [30] [30] [49] [32], [50], [63], [24]
[12], [62], [64],
[32], [29], [67]
(Bio-)Signal [18], [20], [20] [20], [20] [20] [22],
[21], [23], [22], [21] [20]
[27]
Env.
[68], [40] [40]
Parameters
Power Grid [42] [41], [41], [37], [69], [38] [28]
[42] [43], [42]
Other [34], [70], [33], [45] [70], [33] [34], [34], [25] [56], [51], [60], [52], [70],
[54], [44], [54], [54], [73], [53], [66], [74] [44],
[35] [35], [45] [71], [72] [57], [59], [58], [36],
[65] [25], [74]
more efficient hardware platforms to carry on machine learning extraction involves with FFT, DWT, convolution, etc. which
algorithms and 2) optimizing machine learning techniques to are computation intensive.
reduce their power consumption, memory requirement, and Chippa et al. [77] demonstrate that the SVM classifiers
computation intensity. can be approximated by ignoring less important support
Shafique et al. [9] present an overview of the architectures vectors and features. Consequently, the number of required
for efficient deep learning on IoT devices. Venkataramani et al. dot products reduces at the cost of loss in classification
[75] studied the computational requirements of deep learning accuracy. In [78] and [71] approximation methods for artificial
techniques and capabilities of current embedded computing neural networks are presented which first determine the less
platforms. To bridge the computational gap, embedded devices critical neurons and then approximate their computation to
shall benefit from approaches such as hardware accelerators, achieve energy efficiency and memory access reduction under
approximate computing, and emerging post-CMOS technolo- quality constraints. To reduce the number of multiplications in
gies. convolutional neural networks, Huan et al. [79] predict near-
zero multiplications and approximate them. Sheng [80] et al.
exploit approximation in feature extraction of CNN for image
B. Efficient Hardware for Embedded ML
classification by presenting a quantization-friendly separable
Machine learning algorithms can be accelerated using Ap- convolution architecture. Gao et al. [70] carry out some basic
plication Specific Instruction Processor (ASIP), Application arithmetic operations (e.g. DCT, FFT, SVM, k-Means and
Specified Integrated Circuit (ASIC), FPGA or co-processors. KNN) by approximate function units and show the potentials
An ASIP accelerator for machine learning is presented in to save the power consumption.
[74] which also takes advantage of approximate computing Venkataramani et al. [81] introduce scalable-effort classi-
to further improve the energy efficiency. Zhang et al. [67] fiers, cascaded classifiers with growing accuracy and complex-
present design techniques for implementing DNNs on FPGAs ity. Depending on the difficulty of input data, the number of
to achieve energy efficiency for image and video processing. stages used for classifying is adjusted at runtime.
While most of the hardware accelerators target computa- Previous work, despite their accomplishments, mainly are
tional primitives, Chen et al. [76] focus on the impact of ad hoc methods which target specific applications.
memory on accelerators’ efficiency and introduce a series
of hardware accelerators for neural networks to minimize
D. Distribute Computation among Layers (Offloading)
memory transfers.
As shown in Figure 1, the IoT system has a hierarchical
architecture of computational entities. From embedded devices
C. Approximate Computing in ML to the gateways to the fog computer and finally cloud servers,
Approximate Computing has shown promising opportuni- each layer is able to perform a certain amount of computation
ties to improve performance and energy efficiency of both (computational capability increases with the same order).
hardware and software implementation at the cost of quality Offloading the computation to a higher level introduces a
degradation. The main constraints on the embedded IoT de- trade-off between the computation and communication costs
vices include computation capability and memory. In different as well as latency.
ML techniques, either of feature extraction and classification Castillo et al. [63] propose to distribute the layers of Deep
stages might be computation or data intensive. For instance, CNN between embedded device and cloud. This study shows
boosted tree techniques are usually data intensive as they that the task of image recognition can be distributed among
consist of many small trees whose information must be read multiple smart cameras to improve response time and energy
during classification. Moreover, in some models, the feature efficiency. Li et al. [64] consider offloading parts of learning
Elsevier IEEE ACM Arxiv
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
papers 165publication.
271Citation information:
12 DOI 10.1109/JIOT.2019.2893866, IEEE Internet of
139
Things Journal records 6000 4400 11000 2030
layers from IoT devices to the cloud in a deep learning

application. The intermediate CNN layers are performed on the 25% 19% Elsevier 28% Elsevier
edge nodes and the rest of the layers are offloaded to the cloud
IEEE IEEE
servers. In [22], the feature extraction and logistic regression
9% ACM 46% ACM
classification are performed on the embedded device while
24%
the other classifier –Gradient Boosting Tree– is implemented Arxiv Arxiv
47% 2%
on the gateway due to its memory consumption. Eshratifar et
al. [62] break down the computation of deep neural network, Fig. 7: Left: Total retrieved records from search, Right: Papers
perform some parts on the edge device and offload the rest to on both machine learning and IoT
the cloud server.
6000
5000
IoT ML
#publications
E. Distributed and Collaborative Machine Learning 4000
Except the reinforcement learning, other ML models require 3000
2000
a very large amount of data for training which is usually done 1000
on a central cloud at design time. A promising alternative 0
is distributed machine learning where the learning model
parameters are obtained on different edge nodes without Year
sending the raw data to the central cloud. Wang et al. [82]
(a) Number of publications on ML/IoT
present a gradient-descent based distributed learning in which 250
local learning parameters are obtained from edge devices, then 200
224
#publications
126
aggregated on another node, and sent back to the edge devices 150
as global parameters. The frequency of global aggregation 100
46
defines a trade-off between computation cost on edge devices
19
50
12
11
6
2
2
2
1
1
1
1
1
0
0
0
0
0
0
and communication cost on the network infrastructure. A 0
control algorithm is proposed to determine this frequency such
that the loss function of the learner is minimized under a
Year
resource budget constraint.
(b) Number of publications on ML and IoT
Portelli et al. [83] introduce an online distributed learning
model in which the edge devices train their local regression Fig. 8: Number of publications over past years
model independently, and then the parameters of models are
including IEEE Xplore, ACM digital libraries, arXiv reposi-
sent to a central cloud. An adaptive method aggregates the
tory, and Elsevier. We retrieve all the entries that matched
regression models to improve accuracy.
the keywords including ‘machine learning’ and ‘Internet of
As mentioned before, one drawback of reinforcement learn-
Things’ or ‘IoT’. We collect the following information from
ing is the consequent penalty of decisions in some experiences.
the result queries: title, abstract, date, and key terms (when
Reinforcement learners –without knowing it a priori– make
provided1 ).
inefficient decisions and then learn from them. In a collabora-
tive learning approach, the embedded devices may share their In total, we retrieved more than 19,000 records. Then,
experiences with other devices in order to avoid inefficient we processed the records and cross-checked the ‘title’ and
decisions by other devices. In an application-specific attempt, ‘abstract’ with the keywords in machine learning and IoT
[73] uses distributed multi-agent reinforcement learning for domain. The number of records that have the keyword of both
traffic light control. Marinescu et al. [69] introduce another ML and IoT reduced to 605 records. Figure 7 shows the share
case-study for distributed reinforcement learning in the smart of each publication database in the total retrieved records (on
grid where electric vehicles must determine when to charge the left), and the share of filtered publication with both IoT
their batteries based on the demand prediction. Due to the and ML content2 .
distributed nature of some IoT applications, online collabo- Figure 8a shows the publication year of the retrieved papers.
rative learning is one of the promising research direction in The number of publications has grown exponentially in recent
ML for IoT. In [65], a cooperative reinforcement learning years. However, the growth rate for the publications on IoT is
algorithm is used for runtime power allocation in device-to- higher than ML. Note that the data is collected in early 2018,
device communication in order to reduce the interference with and thus the number of publications in 2018 is not available
the cellular users. The neighboring devices share their value yet. Figure 8b shows the publication year of the papers on
functions to increase the explored space. both ML and IoT.
V. A NALYSIS OF P UBLISHED R ESEARCHES

We aim to quantitatively analyze the number of research 1 Key terms are only provided from Elsevier and IEEE. Moreover, ACM
works that have been published as conference papers or journal only provides title and date.
articles that target the domain of ML and IoT. We use the 2 The records from the ACM do not contain the abstract. Therefore fewer
available APIs from academic databases and search engines papers could be categorized as ML and IoT.
Things Journal
A. Classification of Papers Using Machine Learning Autonomous

Survey BigData
Since there are a large number of records retrieved from the Surveillance Cloud
inquiries, we exploit a systematic approach based on machine 100%
SmartPhone Connected-cars
learning to process the records and classify/categorize them.
Each paper may have multiple labels (e.g. privacy & smart 90% LoR
SmartHome DT DNN
home), hence, we must train a binary classifier for each label.
KNN
We randomly select 250 papers from the collected data and 80%
LDA
SmartGrid Fog
manually inspect them by reading the ‘title’, ‘abstract’ and GNB
‘terms’ (if available) and labeling them. We associate each SVM
SmartCity Health
label with several keywords. For instance, the keywords for the
security label include but not limited to ‘security’, ‘malware’, Security Industry
‘side channel’, ‘attack’, ‘encryption’, etc. Management
Reinforement
Privacy Multimedia
B. Feature Selection & Classification

Fig. 10: Accuracy of different classifiers for each label
We extract the following features for our classification
model: (=71%) which is because most of the publications use the
• the number of occurrence of each keyword in the ‘title’
related keywords to ‘Cloud Computing’ in their Abstract. The
and sum of them, average accuracy and F1 score for all classes is 98% and 89%,
• the number of occurrence of each keyword in the ‘ab-
respectively.
stract’ and sum of them,
• whether the author-provided ‘terms’ are available,
• the number of occurrence of each keyword among author-
provided ‘terms’ and sum of them (if they are provided
by API).
Search results ~19,000 Have both ML & IoT
from APIs papers
605 papers
Random split
250 papers 355 papers
Manually labeled
Random split Fig. 11: The classification model which performs best for each
195 for train 55 for test topic in terms of accuracy and F1 score
Train ML models
Evaluate
DT SVM LoR GNB ... C. Analysis
Model Tuning We extract the following classes and labels for the queries:
Choose bests Security, Privacy, BigDada, Cloud, Smart-Building, Smart-
Use models to classify
Home, Assisted Living (also include gesture recognition,
brain-machine-interface, and activity recognition), Smart-Grid,
Fig. 9: Using a machine learning approach to classify the Smart-Industry, Agriculture, Health & Wellbeing, Smart City
published papers about ML in IoT (traffic, pollution, navigation), Smart-Phone, Smart Cars (also
Figure 9 shows the flow of our systematic approach to ana- includes driver monitoring), surveillance, multimedia (image,
lyze the published papers about ML in IoT. For each label, we video, and speech), Reinforcement, DNN, hardware (proposed
train binary classifiers using different models including LoR, hardware architectures and designs), energy (for buildings and
DT, KNN, Linear Discriminant Analysis (LDA), Gaussian NB homes), power (for IoT devices and cloud servers), Edge,
(GNB), and SVM. Figure 10 shows the accuracy of these Survey (overview, survey and perspective papers, tutorials,
classifiers for each label. DT outperforms the other classifiers keynotes), robotic (drones, assistant robots, unmanned vehi-
with an average accuracy of 96.7%. cles), Management (scheduling, resource allocation, operation
One of the widely used performance metrics for the classi- mode selection), Autonomous (self-learning, self-organizing,
fiers is F1 score which combines the precision and recall in self-adapting systems), and ‘other’ (that did not fall into any
one metric. Figure 11 reports the classification model which other category).
performs the best for each class in terms of F1 score and Figure 12 shows the word cloud of the keywords in the
accuracy. For the classes like Smart Grid and Reinforcement, ‘Title‘ and ‘Abstract’ of all the 605 retrieved papers. This
the F1 score and accuracy are 100%. This is due to distinctive figure shows which keywords have been used more frequent in
and specific terms and keywords in these two topics. On the ‘Titles’ (Figure 12a) and ‘Abstract’ (Figure 12b) of these
the other hand, the Cloud class has the lowest F1 score papers.
Things Journal
R EFERENCES
[1] F. Samie, L. Bauer, and J. Henkel, “IoT Technologies for Embedded
Computing: A Survey,” in CODES+ISSS, 2016.
[2] W. Yu, F. Liang, X. He, W. G. Hatcher, C. Lu, J. Lin, and X. Yang,
(a) Titles “A Survey on the Edge Computing for the Internet of Things,” IEEE
Access, vol. 6, pp. 6900–6919, 2018.
[3] F. Samie, V. Tsoutsouras, S. Xydis, L. Bauer, D. Soudris, and J. Henkel,
“Computation Offloading and Resource Allocation for Low-power IoT
Edge Devices,” in IEEE WF-IoT, 2016.
[4] C. Raphael, “Why edge computing is crucial
for the IoT,” Online: http://www.rtinsights.com/
why-edge-computing-and-analytics-is-crucial-for-the-iot/, 2016,
published on November 12, 2015, visited on July 6, 2016.
[5] F. Samie, “Resource management for edge computing in internet of
things (IoT),” Ph.D. dissertation, Karlsruhe Institute of Technology
(KIT), 2018. [Online]. Available: https://publikationen.bibliothek.kit.
edu/1000081031/7142406
(b) [6] F. Samie, V. Tsoutsouras, L. Bauer, S. Xydis, D. Soudris, and J. Henkel,
Abstracts “Oops: Optimizing Operation-mode Selection for IoT Edge Devices,”
ACM Trans. on Internet Technology (TOIT), Accepted to appear, 2018.
[7] O. B. Sezer, E. Dogdu, and A. M. Ozbayoglu, “Context-Aware Com-
puting, Learning, and Big Data in Internet of Things: A Survey,” IEEE
Internet of Things Journal, 2018.
[8] U. S. Shanthamallu, A. Spanias, C. Tepedelenlioglu, and M. Stanley,
“A brief survey of machine learning methods and their sensor and IoT
Fig. 12: The frequent keywords in the (a) title and (b) abstract applications,” in International Conference on Information, Intelligence,
of published papers on machine learning and IoT Systems & Applications (IISA), 2017, pp. 1–8.
[9] M. Shafique, T. Theocharides, C.-S. Bouganis, M. A. Hanif, F. Khalid,
Figure 13 shows the number of papers in each class. To R. Hafiz, and S. Rehman, “An Overview of Next-Generation Architec-
show the overlapping classes, a shared square node is used tures for Machine Learning: Roadmap, Opportunities and Challenges in
which is connected to both classes. the IoT Era,” in DATE, 2018.
[10] M. Shafique, R. Hafiz, M. U. Javed, S. Abbas, L. Sekanina, Z. Vasicek,
Figure 14 shows the trend of publication number in ML and V. Mrazek, “Adaptive and Energy-Efficient Architectures for Ma-
usage for some selected IoT topics. We choose the topics chine Learning: Challenges, Opportunities, and Research Roadmap,” in
Annual Symposium on VLSI (ISVLSI), 2017, pp. 627–632.
whose trend present interesting or unexpected information. [11] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani, “Deep
The highest boost in the number of publications in recent years Learning for IoT Big Data and Streaming Analytics: A Survey,” IEEE
belongs to the usage of deep neural networks in IoT. Moreover, Communications Surveys & Tutorials, 2018.
[12] M. Verhelst and B. Moons, “Embedded Deep Neural Network Process-
the security, healthcare and edge computing has also received ing: Algorithmic and Processor Techniques Bring Deep Learning to IoT
increasing attention in past few years. On the contrary, the and Edge Devices,” IEEE Solid-State Circuits Magazine, vol. 9, no. 4,
number of publications on the usage of ML in smart home pp. 55–65, 2017.
has decreased after moderate growth. Note that the data is for [13] V. Sze, Y.-H. Chen, J. Einer, A. Suleiman, and Z. Zhang, “Hardware for
machine learning: Challenges and opportunities,” in Custom Integrated
the papers published till 2018. However, the presented details Circuits Conference (CICC), 2017, pp. 1–8.
in this section allow other researchers to reproduce the analysis [14] V. Maini and S. Sabri, “Machine learning for humans,” Online: https://
with updated publications at a later time. medium.com/machine-learning-for-humans, 2017, published on August
19, 2017, visited on September 6, 2018.
[15] R. S. Sutton, A. G. Barto et al., Reinforcement learning: An introduction.
MIT press, 1998.
VI. C ONCLUSION [16] J. Henkel, S. Pagani, H. Amrouch, L. Bauer, and F. Samie, “Ultra-
low power and dependability for IoT devices (invited paper for IoT
technologies),” in DATE, 2017, pp. 954–959.
With the massive increase in the collected data from IoT [17] N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, and F. Kawsar,
devices, the traditional cloud computing paradigm is no longer “An early resource characterization of deep learning on wearables,
smartphones and internet-of-things devices,” in International Workshop
fulfilling the diverse requirements of IoT applications and data on Internet of Things towards Applications, 2015.
inference. Machine learning is a key tool for data inference and [18] D. Azariadi, V. Tsoutsouras, S. Xydis, and D. Soudris, “ECG signal
decision making in IoT which can be integrated at different analysis and arrhythmia detection on IoT wearable medical devices,” in
processing layers. Int. Conf. on Modern Circuits and Systems Technologies, 2016.
[19] M. Hadjem, O. Salem, and F. Naı̈t-Abdesselam, “An ECG monitoring
This article reviews the usage of machine learning in system for prediction of cardiac anomalies using WBAN,” in Int. Conf.
different IoT application domains both for data processing and on e-Health Networking, Applications and Services, 2014.
[20] N. Keshan, P. Parimi, and I. Bichindaritz, “Machine learning for stress
management tasks. Moreover, we systematically investigate all detection from ECG signals in automobile drivers,” in International
the publications on the use of machine learning in IoT. We Conference on Big Data (Big Data), 2015.
used classification techniques to determine the topics of each [21] P. S. Pandey, “Machine Learning and IoT for prediction and detection
of stress,” in International Conference on Computational Science and
publication based on its Title, Abstract, etc. The findings show Its Applications (ICCSA), 2017.
that the number of publications on ‘deep neural network’, [22] F. Samie, S. Paul, L. Bauer, and J. Henkel, “Highly efficient and accurate
‘security’, ‘edge computing’ and ‘healthcare’ has boosted in seizure prediction on constrained IoT systems,” in DATE, 2018.
recent years, while ‘smart home’ shows a slight decrease. In [23] G. Li, B.-L. Lee, and W.-Y. Chung, “Smartwatch-based wearable EEG
system for driver drowsiness detection,” IEEE Sensors Journal, 2015.
addition, we highlight the main research directions towards [24] D. Borthakur, H. Dubey, N. Constant, L. Mahler, and K. Mankodiya,
efficient machine learning at the edge of IoT. “Smart Fog: Fog Computing Framework for Unsupervised Clustering
Things Journal
Management Cloud Legend

24 8 8 Smart City
Smart Phone Security 6
15 52 13 App.
6 47 19 41
DNN Domain
14 14 11
Reinforcement 97 Survey #papers
9
7 Smart Grid 8 18
46 75 130 Multimedia
9 Common
Energy 48 papers
6 15 9 39
23 Health 8 66 #papers
7 10 20 7
12 9
10
70 11 Big Data 6 Industry How?
19 What for?
Smart Home 10 13
Edge 34 Hardware 44 #papers
78
17 27 17
67 71
Robotic Agriculture Assisted Connected Cars Privacy Autonomous Other
15 11 22 25 18
34 30
Fig. 13: Accuracy of different classifiers for each label

60 cyber-physical system,” in IEEE 4th World Forum on Internet of Things
#publications
DNN Edge (WF-IoT), 2018, pp. 784–789.

40
Smart Home Industry [37] Q. Wei, D. Liu, and G. Shi, “A novel dual iterative Q-learning method
for optimal battery management in smart residential environments,”
20 Security Health IEEE Transactions on Industrial Electronics, 2015.
0 [38] D. O’Neill, M. Levorato, A. Goldsmith, and U. Mitra, “Residential
demand response using reinforcement learning,” in Int. Conf. on Smart
Grid Communications (SmartGridComm), 2010, pp. 409–414.
[39] R. Thapa, L. Jiao, B. J. Oommen, and A. Yazidi, “A learning automaton-
Year based scheme for scheduling domestic shiftable loads in smart grids,”
Fig. 14: Publication trend of ML usage in different IoT applica- IEEE Access, vol. 6, pp. 5348–5361, 2018.
tions domains [40] N. Sharma, P. Sharma, D. Irwin, and P. Shenoy, “Predicting solar
generation from weather forecasts using machine learning,” in Int. Conf.
Analytics in Wearable Internet of Things,” IEEE Global Conference on on Smart Grid Communications (SmartGridComm), 2011, pp. 528–533.
Signal and Information Processing (GlobalSIP), 2017. [41] K. Chen, J. Jiang, F. Zheng, and K. Chen, “A novel data-driven approach
[25] W. Li, T. Logenthiran, V.-T. Phan, and W. L. Woo, “Implemented IoT for residential electricity consumption prediction based on ensemble
based Self-learning Home Management System (SHMS) for Singapore,” learning,” Energy, 2018.
IEEE IoT-J, 2018. [42] T. Vantuch, A. G. Vidal, A. P. Ramallo-González, A. F. Skarmeta, and
[26] U. Bakar, H. Ghayvat, S. Hasanm, and S. Mukhopadhyay, “Activity and S. Misák, “Machine learning based electric load forecasting for short
anomaly detection in smart home: A survey,” in Next Generation Sensors and long-term period,” in IEEE 4th World Forum on Internet of Things
and Systems, 2016, pp. 191–220. (WF-IoT), 2018, pp. 511–516.
[27] X. Fafoutis, L. Marchegiani, A. Elsts, J. Pope, R. Piechocki, and [43] V. Ford, A. Siraj, and W. Eberle, “Smart grid energy fraud detection
I. Craddock, “Extending the battery lifetime of wearable sensors with using artificial neural networks,” in Computational Intelligence Appli-
embedded machine learning,” in IEEE 4th World Forum on Internet of cations in Smart Grid (CIASG), 2014, pp. 1–6.
Things (WF-IoT), 2018, pp. 269–274. [44] S.-Y. Lee, S.-r. Wi, E. Seo, J.-K. Jung, and T.-M. Chung, “ProFiOt:
[28] S. Idowu, S. Saguna, C. Åhlund, and O. Schelén, “Forecasting heat load Abnormal Behavior Profiling (ABP) of IoT devices based on a machine
for smart district heating systems: A machine learning approach,” in Int. learning approach,” in ITNAC, 2017, pp. 1–6.
Conf. on Smart Grid Communications (SmartGridComm), 2014. [45] S. Zhao, W. Li, T. Zia, and A. Y. Zomaya, “A dimension reduction
[29] Y. Yue, X. Cheng, D. Zhang, Y. Wu, Y. Zhao, Y. Chen, G. Fan, and model and classifier for anomaly-based intrusion detection in internet
Y. Zhang, “Deep recursive super resolution network with laplacian pyra- of things,” in DASC/PiCom/DataCom/CyberSciTech, 2017.
mid for better agricultural pest surveillance and detection,” Computers [46] J. Wei and G. J. Mendis, “A deep learning-based cyber-physical strategy
and Electronics in Agriculture, 2018. to mitigate false data injection attack in smart grids,” in Cyber-Physical
[30] E. Mwebaze and G. Owomugisha, “Machine learning for plant disease Security and Resilience in Smart Grids (CPSR-SG, 2016.
incidence and severity measurements from leaf images,” in International [47] S. Sharmeen, S. Huda, J. H. Abawajy, W. N. Ismail, and M. M. Hassan,
Conference on Machine Learning and Applications (ICMLA), 2016. “Malware threats and detection for industrial mobile-IoT networks,”
[31] S. S. Patil and S. A. Thorat, “Early detection of grapes diseases using IEEE Access, 2018.
machine learning and IoT,” in International Conference on Cognitive [48] P. Sun, J. Li, M. Z. A. Bhuiyan, L. Wang, and B. Li, “Modeling
Computing and Information Processing (CCIP), 2016. and clustering attacker activities in IoT through machine learning
[32] S. Yahata, T. Onishi, K. Yamaguchi, S. Ozawa, J. Kitazono, T. Ohkawa, techniques,” Information Sciences, 2018.
T. Yoshida, N. Murakami, and H. Tsuji, “A hybrid machine learning [49] H.-J. Jeong, H.-J. Lee, and S.-M. Moon, “Work-in-progress: cloud-based
approach to automatic plant phenotyping for smart agriculture,” in machine learning for IoT devices with better privacy,” in EMSOFT, 2017.
International Joint Conference on Neural Networks (IJCNN), 2017. [50] J. Chauhan, S. Seneviratne, Y. Hu, A. Misra, A. Seneviratne, and Y. Lee,
[33] A. Mangal and N. Kumar, “Using big data to enhance the Bosch “Breathing-Based Authentication on Resource-Constrained IoT Devices
production line performance: A kaggle challenge,” in IEEE International using Recurrent Neural Networks,” Computer, vol. 51, no. 5, 2018.
Conference on Big Data (Big Data), 2016, pp. 2029–2035. [51] G. M. Dias, M. Nurchis, and B. Bellalta, “Adapting sampling interval
[34] R. Carbonneau, K. Laframboise, and R. Vahidov, “Application of of sensor networks using on-line reinforcement learning,” in WF-IoT,
machine learning techniques for supply chain demand forecasting,” 2016.
European Journal of Operational Research, vol. 184, no. 3, pp. 1140– [52] M. S. Golanbari, S. Kiamehr, F. Oboril, A. Gebregiorgis, and M. B.
1154, 2008. Tahoori, “Post-fabrication calibration of near-threshold circuits for en-
[35] G. A. Susto, A. Schirru, S. Pampuri, S. McLoone, and A. Beghi, ergy efficiency,” in ISQED, 2017.
“Machine learning for predictive maintenance: A multiple classifier [53] M. Chafii, F. Bader, and J. Palicot, “Enhancing coverage in narrow band-
approach,” IEEE Transactions on Industrial Informatics, vol. 11, no. 3, IoT using machine learning,” in IEEE WCNC), 2018.
pp. 812–820, 2015. [54] F. A. Kraemer, D. Ammar, A. E. Braten, N. Tamkittikhun, and D. Palma,
[36] Z. Wu, H. Luo, Y. Yang, X. Zhu, and X. Qiu, “An unsupervised “Solar energy prediction for constrained IoT nodes based on public
degradation estimation framework for diagnostics and prognostics in weather forecasts,” in Int. Conf. on the Internet of Things, 2017.
Things Journal
[55] W. Cui, Y. Kim, and T. S. Rosing, “Cross-platform machine learning [77] V. K. Chippa, D. Mohapatra, A. Raghunathan, K. Roy, and S. T.
characterization for task allocation in IoT ecosystems,” in Computing Chakradhar, “Scalable effort hardware design: Exploiting algorithmic
and Communication Workshop and Conference (CCWC), 2017. resilience for energy efficiency,” in DAC, 2010.
[56] X. He, K. Wang, H. Huang, T. Miyazaki, Y. Wang, and S. Guo, “Green [79] Y. Huan, Y. Qin, Y. You, L. Zheng, and Z. Zou, “A multiplication
resource allocation based on deep reinforcement learning in content- reduction technique with near-zero approximation for embedded learning
centric IoT,” IEEE Trans. on Emerging Topics in Computing, 2018. in IoT devices,” in International System-on-Chip Conference (SOCC),
[57] A. Iranfar, W. A. Simon, M. Zapater, and D. Atienza, “A machine 2016, pp. 102–107.
learning-based strategy for efficient resource management of video [80] T. Sheng, C. Feng, S. Zhuo, X. Zhang, L. Shen, and M. Aleksic, “A
encoding on heterogeneous MPSoCs,” in ISCAS, 2018. quantization-friendly separable convolution for mobilenets,” in Work-
[58] A. I. Orhean, F. Pop, and I. Raicu, “New scheduling approach using shop on Energy Efficient Machine Learning and Cognitive Computing
reinforcement learning for heterogeneous distributed systems,” Journal for Embedded Applications (EMC2), 2018.
of Parallel and Distributed Computing, 2017. [81] S. Venkataramani, A. Raghunathan, J. Liu, and M. Shoaib, “Scalable-
[59] M. Min, X. Wan, L. Xiao, Y. Chen, M. Xia, D. Wu, and H. Dai, effort classifiers for energy-efficient machine learning,” in Design Au-
“Learning-based privacy-aware offloading for healthcare IoT with energy tomation Conference (DAC), 2015, p. 67.
harvesting,” IEEE Internet of Things Journal, 2018. [82] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and
[60] B. Demirel, A. Ramaswamy, D. E. Quevedo, and H. Karl, “Deepcas: K. Chan, “When edge meets learning: Adaptive control for resource-
A deep reinforcement learning algorithm for control-aware scheduling,” constrained distributed machine learning,” IEEE INFOCOM, 2018.
IEEE CONTROL SYSTEMS LETTERS, 2018. [83] K. Portelli and C. Anagnostopoulos, “Leveraging edge computing
[61] E. Balevi and R. D. Gitlin, “Unsupervised machine learning in 5G net- through collaborative machine learning,” in International Conference on
works for low latency communications,” in International Performance Future Internet of Things and Cloud Workshops (FiCloudW),, 2017.
Computing and Communications Conference (IPCCC), 2017, pp. 1–2.
[62] A. E. Eshratifar and M. Pedram, “Energy and performance efficient
computation offloading for deep neural networks in a mobile cloud
computing environment,” in Great Lakes Symposium on VLSI, 2018,
pp. 111–116.
[63] E. A. Castillo and A. Ahmadinia, “Distributed Deep Convolutional
Farzad Samie is currently a postdoctoral researcher
Neural Network For Smart Camera Image Recognition,” in International
and lecturer at the Chair for Embedded Systems
Conference on Distributed Smart Cameras, 2017.
(CES) at KIT. He received his Ph.D. (Dr.-Ing.)
[64] H. Li, K. Ota, and M. Dong, “Learning IoT in Edge: Deep Learning for
in Computer Science from Karlsruhe Institute of
the Internet of Things with Edge Computing,” IEEE Network, 2018.
Technology (KIT) in Jan. 2018. He also received
[65] M. I. Khan, M. M. Alam, Y. Le Moullec, and E. Yaacoub, “Cooperative
his M.Sc. and B.Sc. in Computer Engineering from
reinforcement learning for adaptive power allocation in device-to-device
the Sharif University of Technology, Iran in 2008
communication,” in IEEE 4th World Forum on Internet of Things (WF-
and 2010, respectively. His main research work is
IoT), 2018, pp. 476–481.
focused on Internet-of-Things (IoT) with a special
[66] M. Mohammadi, A. Al-Fuqaha, M. Guizani, and J.-S. Oh, “Semisu-
interest in Edge Computing and healthcare applica-
pervised deep reinforcement learning in support of IoT and smart city
tions.
services,” IEEE Internet of Things Journal, 2018.
[67] X. Zhang, A. Ramachandran, C. Zhuge, D. He, W. Zuo, Z. Cheng,
K. Rupnow, and D. Chen, “Machine learning on FPGAs to face the IoT
revolution,” in ICCAD, 2017.
[68] A. A. Sarangdhar and V. Pawar, “Machine learning regression tech-
nique for cotton leaf disease detection and controlling using IoT,” in
International conference of Electronics, Communication and Aerospace Lars Bauer received the M.Sc. and Ph.D. degrees in
Technology (ICECA), 2017. computer science from the University of Karlsruhe,
[69] A. Marinescu, I. Dusparic, and S. Clarke, “Prediction-based multi-agent Germany, in 2004 and 2009. He is currently a
reinforcement learning in inherently non-stationary environments,” ACM research group leader and lecturer at the Chair for
Transactions on Autonomous and Adaptive Systems (TAAS), vol. 12, Embedded Systems (CES) at the Karlsruhe Institute
no. 2, p. 9, 2017. of Technology (KIT). Dr. Bauer received two dis-
[70] M. Gao, Q. Wang, M. T. Arafin, Y. Lyu, and G. Qu, “Approximate sertation awards (EDAA and FZI), two best paper
Computing for Low Power and Security in the Internet of Things,” awards (AHS11 and DATE08) and several nomina-
Computer, vol. 50, no. 6, pp. 27–34, 2017. tions. His research interests include architectures and
[71] Q. Zhang, T. Wang, Y. Tian, F. Yuan, and Q. Xu, “ApproxANN: an management for adaptive multi-/many-core systems.
approximate computing framework for artificial neural network,” in
DATE, 2015.
[72] Z. Tang, S. Guo, P. Li, T. Miyazaki, H. Jin, and X. Liao, “Energy-
efficient transmission scheduling in mobile phones using machine
learning and participatory sensing,” IEEE Transactions on Vehicular
Technology, 2015.
[73] Y. Liu, L. Liu, and W.-P. Chen, “Intelligent traffic light control using dis- Jörg Henkel is the Chair Professor for Embedded
tributed multi-agent Q learning,” International Conference on Intelligent Systems at Karlsruhe Institute of Technology. His
Transportation Systems (ITSC), 2017. research work is focused on co-design for embedded
[74] J. Teittinen, M. Hiienkari, I. Žliobaitė, J. Hollmen, H. Berg, J. Heiskala, hardware/software systems with respect to power,
T. Viitanen, J. Simonsson, and L. Koskinen, “A 5.3 pJ/op approximate thermal and reliability aspects. He served as the
TTA VLIW tailored for machine learning,” Microelectronics Journal, Editor-in-Chief for the ACM Transactions on Em-
2017. bedded Computing Systems and is currently the
[75] S. Venkataramani, K. Roy, and A. Raghunathan, “Efficient Embedded Editor-in-Chief of the IEEE Design&Test Magazine.
Learning for IoT Devices,” in Asia and South Pacific Design Automation He has led several conferences as a General Chair
Conference (ASP-DAC), 2016, pp. 308–311. incl. ICCAD, ESWeek and serves as a Steering Com-
[76] Y. Chen, T. Chen, Z. Xu, N. Sun, and O. Temam, “Diannao family: mittee chair/member for leading conferences and
energy-efficient hardware accelerators for machine learning,” Communi- journals for embedded and cyber-physical systems. Prof. Henkel coordinates
cations of the ACM, vol. 59, no. 11, pp. 105–112, 2016. the DFG program SPP 1500 “Dependable Embedded Systems” and is a
[78] S. Venkataramani, A. Ranjan, K. Roy, and A. Raghunathan, “AxNN: site coordinator of the DFG TR89 collaborative research center on “Invasive
energy-efficient neuromorphic systems using approximate computing,” Computing”. He is a Fellow of the IEEE.
in ISLPED, 2014.

From Cloud Down To Things An Overview of Machine Learning in Internet

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

From Cloud Down To Things An Overview of Machine Learning in Internet

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

From Cloud Down to Things: An Overview of

Abstract—With the numerous IoT devices, the cloud-centric

< Decision Making >

learning in IoT’ are retrieved and analyzed systematically using

TABLE I: Existing surveys on IoT and ML

ANN Ensemble trees

TABLE II: The characteristics of some ML techniques

pros. and transmitted to a remote processing unit. The classifier

deployment. Traversing trees can be parallelized.

Cloud Things Distributed

[46]§ detects the false data injection in smart grid using

expanded version of LoR,

[52] selects the optimal supply voltage for

[59] manages computation offloading for

[65] uses cooperative reinforcement learning for power communication,

[25]¶ self-learning home management system for price

[61] clusters the fog nodes in 5G network to reduce

- [24]‡ uses k-means clustering for physiolog- ADP

layers from IoT devices to the cloud in a deep learning

V. A NALYSIS OF P UBLISHED R ESEARCHES

A. Classification of Papers Using Machine Learning Autonomous

B. Feature Selection & Classification

Management Cloud Legend

Fig. 13: Accuracy of different classifiers for each label

DNN Edge (WF-IoT), 2018, pp. 784–789.

You might also like