Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

15320634, 2020, 17, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5651 by Selcuk Universitesi, Wiley Online Library on [06/11/2022].

See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Received: 31 May 2018 Revised: 6 August 2019 Accepted: 12 December 2019
DOI: 10.1002/cpe.5651

SPECIAL ISSUE PAPER

A platform architecture for occupancy detection using stream


processing and machine learning approaches

Hamza Elkhoukhi1,2 Youssef NaitMalek1,3 Mohamed Bakhouya1


Anass Berouine1,3 Abdelhak Kharbouch1,2 Fadwa Lachhab1,4
Majdoulayne Hanifi1 Driss El Ouadghiri2 Mohamed Essaaidi3
1 Faculty of Computing and Logistics, LERMA,
International University of Rabat, Rabat, Summary
Morocco
2 Faculté des Sciences, Université Moulay
Context-awareness in energy-efficient buildings has been considered as a crucial fact for devel-
Ismaïl, Meknès, Morocco oping context-driven control approaches in which sensing and actuation tasks are performed
3 ENSIAS, Mohammed V University, Rabat,
according to the contextual changes. This could be done by including the presence of occupants,
Morocco
4 International University of Agadir, Agadir,
number, actions, and behaviors in up-to-date context, taking into account the complex inter-

Morocco linked elements, situations, processes, and their dynamics. However, many studies have shown
that occupancy information is a major leading source of uncertainty when developing control
Correspondence approaches. Comprehensive and real-time fine-grained occupancy information has to be, there-
Hamza Elkhoukhi, Faculty of Computing and
fore, integrated in order to improve the performance of occupancy-driven control approaches.
Logistics, LERMA, International University of
Rabat, Rabat 11100, Morocco; or Université The work presented in this paper is toward the development of a holistic platform that combines
Moulay Ismaïl, Faculté des sciences, 11201 recent IoT and Big Data technologies for real-time occupancy detection in smart building. The
Meknès, Morocco.
purpose of this work focuses mainly on the presence of occupants by comparing both static and
Email: hamza.elkhoukhi@gmail.com
dynamic machine learning techniques. An open-access occupancy detection dataset was first
Funding information used to assess the usefulness of the platform and the effectiveness of static machine learning
U.S. Agency for International Development,
strategies for data processing. This dataset is used for applications that follow the strategy
Grant/Award Number: 5-398; Centre National
pour la Recherche Scientifique et Technique aiming at storing data first and processing it later. However, many smart buildings' applications,
such as HVAC and ventilation control, require online data streams processing. Therefore, a dis-
tributed real-time machine learning framework was integrated into the platform and tested to
show its effectiveness for this kind of applications. Experiments have been conducted for venti-
lation systems in energy-efficient building laboratory (EEBLab) and preliminary results show the
effectiveness of this platform in detecting on-the-fly presence of occupants, which is required
to either make ON or OFF the system and then activate the corresponding embedded control
technique (eg, ON/OFF, PID, state-feedback).

KEYWORDS

context-awareness, Internet of Things, machine learning, real-time data processing, Smart


buildings

1 INTRODUCTION

Recent studies have shown that occupants' information (eg, number, presence, behavior, and activities) is a major input for control approaches
in smart and energy-efficient buildings.1-3 In fact, comprehensive fine-grained occupancy information could be integrated to improve the
performance of occupancy-driven control of active equipment, such as HVAC, lighting, and ventilation systems. Occupancy information can be
classified into two main categories as illustrated in Figure 1. Spatial and temporal properties provide occupants' physical information, while the
behavioral properties provide information about the activity of occupants.5 For instance, carbon dioxide sensors are commonly used in buildings
for demand-driven control of ventilation systems because they can infer an estimate of space occupancy. Furthermore, the CO2 concentration is
often available in monitored indoor environment in order to assess the indoor air quality. Mainly, the measurement of the amount of CO2 in a
space was considered as a main indicator for occupancy prediction (eg, presence and number).6

Concurrency Computat Pract Exper. 2020;32:e5651. wileyonlinelibrary.com/journal/cpe © 2019 John Wiley & Sons, Ltd. 1 of 13
https://doi.org/10.1002/cpe.5651
15320634, 2020, 17, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5651 by Selcuk Universitesi, Wiley Online Library on [06/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 of 13 ELKHOUKHI ET AL.

FIGURE 1 Example of occupancy


parameters4

Many research works have stated that accurate occupancy detection can be achieved by combining a multisensor data, such as CO2 ,
temperature, humidity, sound, and motion sensors.1,7,8 In fact, accurate detection of the actual occupancy could help in developing context-driven
control approaches in which sensing and actuation tasks are performed according to the contextual changes. Furthermore, with recent advances
in wireless sensors networks, many industries and researchers have confirmed the potential of IoT as an enabler to the development of intelligent
and context-aware services and applications.9 These services could dynamically react to the environmental changes and users' preferences. The
main aim is to make occupants' life more comfortable according to their locations, current requirements, and ongoing activities. For instance, in
the context of smart and energy-efficient buildings, occupancy information could be used for controlling window opening and shading, HVAC,10,11
and Lighting12 with the aim is to decrease energy consumption while maintaining visual, air quality, and thermal comfort of occupants in buildings.
However, handling dynamic and frequent context's changes is a difficult task without a real-time event/data acquisition and processing
platform. In the past few years, data processing approaches have been proposed to handle this issue and can be classified into two main categories
as shown in Figure 2: static and dynamic approaches. Static approaches focus on analyzing events coming from multiple sources, which are
already stored and indexed in a database. The aim is to extract complex events (ie, context) and derive meaningful environment's or system's
changes. Data fusion approaches that correlate many events from different types of sensors are examples of static approaches, which could be
used to enhance the accuracy of monitored environments. Complex event processing techniques, such ETALIS, are other examples that have
been proposed to process and correlate simple events coming from different sources.4 Moreover, static approaches have been developed for
high-throughput processing of large-scale data, which are not suitable for real-time services and applications since input data must be completely
stored before its computation.
Dynamic approaches have been recently developed to process in real-time a very large amount of data. Their main aim is to extract new
knowledge (resp. to anticipate future situations) that are required to generate, in real-time, suitable mitigation actions (resp. anticipate required
actions).13 For example, integrating streams from sensors with other data, such as location, environmental context, and social media data, it is
now possible to develop context-aware applications, which can, for example, provide better traffic routing throughout the city, real-time control
of small autonomous vehicles, real-time recommendations to a user, real-time building's management, and equipment control. Dynamic methods
include both remote and in-field processing techniques. Remote real-time processing approaches focus on analyzing data remotely, for instance,
using big data cluster (or in the Cloud), in order to reduce the processing time. However, high-latency data transmission may occur because
of heavy and frequent data streams, which have to be submitted via the network medium. In-field real-time data processing approaches could
handle this issue by combining monitoring, data processing, and intelligent control to help in developing systems that adapt and evolve according
to its internal and external contexts,14 but their embedded processing may become limited due the devices constraints (eg, memory and CPU).
In previous and recent work, recent IoT tool, Kaa, has been integrated with real-time stream processing technology, Storm, into a holistic
platform for continuous and real-time data monitoring and processing.15 Experimental results obtained from several applications, already
deployed in EEBLab (eg, HVAC, ventilation, and context monitoring), showed its usefulness for developing context-aware services. In this work,
we investigate the integration of machine learning algorithms for occupancy detection,16 mainly the presence of occupants, which is used for

FIGURE 2 A summarized classification of data processing approaches


15320634, 2020, 17, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5651 by Selcuk Universitesi, Wiley Online Library on [06/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ELKHOUKHI ET AL. 3 of 13

controlling ventilation systems. In fact, the developed platform includes most classification, clustering, and regression algorithms and allows the
easy implementation of distributed streaming algorithms. The focus of this work is to shed more light on the usefulness of integrating recent IoT,
stream data processing technologies, and real-time machine learning algorithms for occupants' presence detection, in order to enable autonomous
control of active equipment, such HVAC, lighting, and ventilation systems.5
The remainder of this paper is structured as follows. Section 2 presents recent work related to occupancy detection algorithms. In Section 3,
the architecture of the holistic platform is introduced. Section 4 presents experimental results regarding both static and real-time detection
approaches of occupants' presence. An experimental scenario is also shown in Section 5 and concerns the occupancy-control of ventilation
systems. Conclusions and some perspectives are given in Section 6.

2 RELATED WORK

IoT technologies have been recently developed in order to connect a variety of building systems along with other environmental and contextual
sensors, such as CO2 , temperature, humidity, and motion sensors to monitor and collect useful building's information for occupancy detection
purposes.17,18 In fact, instantaneous indoor occupancy information in buildings becomes a major factor to improve occupants' comfort and to
greatly save and reduce energy consumption by developing optimal control approaches of HVAC and lighting systems.15 Recent studies show
that combining these technologies with a various existing machine learning approaches (eg, classification algorithms) can significantly improve
occupancy detection accuracy by establishing data-driven prediction of a occupancy model from a sample inputs (ie, CO2 , temperature, humidity,
and light). For example, Kleiminger et al8 used electricity consumption dataset gathered from smart electricity meters in five households in order
to detect the presence of occupants. Using this dataset, authors trained several classification models based on support vector machines (SVM),
K-nearest neighbor (KNN), thresholding (THR), and hidden Markov model (HMM). The reported accuracy was above 80% compared to the truth
occupancy (100%), which was obtained using a tablet computer installed in the main entrance in order to record the true values of occupants'
presence.
Candanedo and Feldheim19 applied different statistical classification models, mainly LDA (Linear Discriminant Analysis), CART (Classification
And Regression Trees), and RF (Random Forest) on the dataset that contains light, temperature, humidity, and CO2 values. They have shown that
including information related to the time of the day and week status (weekend, weekdays) increase the accuracy of occupancy detection by 32%
with high accuracy (around 97%) when using only two predictors. However, the truth values were obtained using a digital camera for supervised
classification model training. Tutuncu et al20 applied seven different artificial neural network (ANN) algorithms to the same dataset, used in the
work of Candanedo and Feldheim,19 to train the classifiers model. The result showed that the Limited Memory Quasi-Newton algorithm has the
highest accuracy rate, around 99%. Khan et al21 proposed an approach for accurate occupancy estimation based on a wireless sensor network,
which combines environmental sensors with uncertain contextual information. The authors' study also used SVM and K-nearest neighbor to train
the classifiers models.
In the work of Yang et al,22 a model of occupancy detection was evaluated in a single occupancy room using 12 ambient sensors in order
to train six machine learning classification algorithms: SVM, KNN, ANN, Na𝚤ve Bayesian (NB), Tree Augmented Na𝚤ve Bayes Network (TAN),
and Decision Tree (DT). The authors found that CO2 , door status, and light variables have important contributions to the final modeling results.
The model accuracy is ranged from 92.2% to 98.2% according to the used classifier algorithm. A predictive model from electricity and water
consumption data was evaluated using Monte Carlo simulations.23 Authors found that the Random Forest and Decision Tree classifiers under
their boosting version had the best classification performance with an F-measure of 83.37% and 82.79%, respectively. The truth occupancy was
found using a door counter sensor. Chen et al24 proposed an approach based on data from thermal energy storage of electric water heaters to
train the classifiers model.
Alike the presence of occupants, the prediction of occupants' number is also of most importance for different building services, such as
the ventilation systems control. For example, in the work of Ebadat et al,25 a correct occupancy level was evaluated by developing a dynamic
model using data, which are extracted from CO2 , ventilation, temperature, and sensors. Authors showed that the developed model has a best
performance than predictions obtained by Support Vector Machine and Neural Network estimators. In the work of Dong et al,26 the reported
results show an average of 73% accuracy in detecting the number of occupants by using hidden Markov models. In the work of Jin et al,27 an
approach of occupancy detection was tested in a conference room using environmental sensing based on CO2 concentration. Obtained results
demonstrate that this approach can reliably detect the number of occupants with an overall RMSE of 0.6044 (fractional person) while the best
alternative machine learning algorithm by Bayes Net is 1.2061.
However, despite the importance of these algorithms in detecting occupancy from both static and stream data, dynamic and real-time detection
approaches are required to be applied for deploying near real-time actions, for example, in HVAC and ventilation control. The integration of
IoT and data streams technologies into a holistic platform together with machine learning algorithms could enable new potentials in smart
buildings for real-time occupancy detection. The work presented in this paper investigates this research direction by integrating recent IoT and
stream data processing technologies for occupancy detection, mainly the presence of occupants. In fact, recent studies show the effectiveness
of real-time computing using stream machine learning algorithms. For instance, Bifet et al28 presented and demonstrated StreamDM, which is a
platform developed at Huawei Noah's Ark Lab for real-time analytics. In fact, StreamDM is the first library that contains advanced stream mining
15320634, 2020, 17, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5651 by Selcuk Universitesi, Wiley Online Library on [06/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 of 13 ELKHOUKHI ET AL.

algorithms using Spark Streaming. Furthermore, Morales and Bifet16 introduced a scalable advanced massive online analysis (Samoa), which is a
platform for mining big data streams including distributed streaming algorithms.
The work presented in this paper is toward the development of a smart building energy management platform using IoT and stream data
processing technologies. More precisely, the aim is to develop a holistic platform for data streams acquisition and processing using real-time
machine learning algorithms. The platform services will be then developed and used for efficient optimization of energy consumption according to
actual and predicted energy production and electricity consumption. In fact, Big data and IoT technologies (for advanced metering) together with
real-time machine learning have to be combined for timely analyzing data and events' streams and predicting actual demands (ie, consumption) and
renewable power generation (ie, production). The main question we are targeting is how recent IoT, Big Data technologies, and advance real-time
machine learning algorithms could be combined to develop and deploy an efficient building energy management platform that autonomously
measure (sense), analyze, plan, and act (execute) according the actual and predicted context. The rest of this document presents the platform
prototype architecture and the preliminary work done by integrating real-time machine learning for timely data processing, by focusing on
occupancy prediction.

3 THE METHODOLOGY

This section introduces the architecture of the platform prototype together with real-time processing techniques that could be used for occupancy
prediction in smart buildings.

3.1 Platform architecture for prototyping


This section presents the architecture of the platform prototype that has been deployed in EEBLab for real-time detection of occupants' presence.
As shown in Figure 3, it was designed as generic as possible to be applied in different smart environments that require real-time monitoring and
processing, such as in intelligent transportation and healthcare. This architecture includes mainly three main layers: sensors and actuators layer,
processing layer, and services layer. The first layer is composed of different sensors and actuators that, when deployed, could be configured and
remotely controlled. In fact, sensors are used to gather indoor data, which are submitted to the processing layer. This layer is composed of a
pre-processing unit to ensure that data to be submitted and stored are well structured, and a real-time processing unit that integrates IoT and
stream data processing tools for processing these sensors data streams. For instance, many processing algorithms can be included to extract the
building's contexts, such as occupancy detection. The actual extracted contexts could be used as inputs for the services' layer, such as HVAC,
lighting, ventilation, and shading systems. For example, knowing the presence of occupants in buildings could be helpful in developing predictive
control methods.29
In order to integrate the occupants' presence scenario, a Kaa application is deployed into a Raspberry Pi 3, which gathers data from either the
dataset or directly from deployed sensors and sends them to the Kaa platform as an event. A machine learning algorithm was integrated into
the platform and deployed in order to execute the real-time processing for occupants' presence detection based on received data streams. As
illustrated in Figure 4, the scenario was deployed in EEBLab for real-sitting experiments. The EEBLab includes also several sensors and actuators
(eg, CO2 , temperature, humidity, current/voltage) used for conducting real experiments. Several equipment (eg, ventilators, HVAC, and lighting)
have been also deployed for developing context-driven control approaches.
For the experiment setup, four main types of indoor sensors have been used: power consumption of the lighting system, the internal
temperature, humidity, and CO2 . The data streams from these sensors have been used to predict the status of occupants' presence (ie, 0 for not
occupied, 1 for occupied). As depicted in Figure 4, the installation of CO2 sensor was in the ceiling. The main reason is that the warm breath of
occupants acts as a bubble of gas, which rises to the ceiling, since it is more floatable than the ambient air. This is also why ventilation fans are
almost deployed in high position, for extracting CO2 , and in low position, for getting fresh outdoor air.

FIGURE 3 Architecture of the


platform prototype deployed for
smart buildings applications and
services: (A) horizontal and
(B) vertical
15320634, 2020, 17, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5651 by Selcuk Universitesi, Wiley Online Library on [06/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ELKHOUKHI ET AL. 5 of 13

FIGURE 4 The sensing


components and equipment
deployed into the EEBLab

Moreover, numerous components play an important role in this platform, for the sensors, MG811 is used to provide the concentration of CO2 ,
DHT22 sensor is deployed to get the temperature and the humidity level inside the EEBLab, ACS712 was installed to measure the electrical
consumption of lighting, and an infrared sensor was installed on the door to obtain the actual occupancy level. Further, Raspberry pi was used
as a pre-processing unit for well structuring the data coming from sensors and send the corrected data to the platform. In fact, Kaa platform,
an open-source middleware platform, designed for implementing complete end-to-end IoT solutions was used to collect data from sensors.
Apache Kafka, a real-time data pipeline, was used for transferring data to the Samoa application. Finally, MongoDB is an open-source NoSQL
document-oriented database, which is used for storing data. In fact, sensors are used to gather indoor data using two embedded devices (eg,
Arduino), the first one intended to collect environment data consisting of CO2 concentration, humidity, temperature, and light consumption.
The second device is designed to gather the accurate occupancy level that is considered as a class label. It is indeed the discrete attribute that
should be predicted based on the values of other attributes. Moreover, the date of the Raspberry Pi is attached to the instance as a timestamp to
have an idea about the occupancy timeline through the day. The Raspberry Pi is used to communicate with Kaa platform by implementing a java
application including Kaa endpoint SDK that works in conjunction with Kaa cluster, ie, it serves as a cloud-based middleware for an IoT solution.

3.2 Stream data processing


Real-time stream processing requires a platform that can be deployed for timely data processing and analysis. As previous work,15 IoT techniques
were combined with stream data processing technologies into a holistic platform for continuous and real-time data monitoring and processing.
Open source tools like Apache Storm, Kaa, and Apache Flume were used to develop a platform for processing real-time streaming data from
sensors. The added value of the work presented in this paper is the integration of Kaa platform with Samoa, which can be executed on the Storm
in order to enable the real-time stream processing of data using machine learning algorithms. The aim is to investigate and further develop this
platform for efficient building energy management, ie, autonomously sense, analyze, predict, plan, and act according the actual and future context.
Several machine learning algorithms, as sated above, have been proposed for data prediction. As depicted in Figure 5, these approaches can
be categorized into two main approaches: static and dynamic approaches; each approach can be either distributed or not distributed. Distributed
approaches have been proposed to address the limited resources, such memory, and bandwidth of a single machine. So, we put more emphasis
on distributed dynamic processing approaches by further investigating their actual effectiveness for applications that require real-time processing
and control (eg, buildings management). In this direction, we have further enhanced the Samoa platform for developing context-driven services
and applications. This platform originally combines Strom and Samoa framework in order to execute streaming machine learning algorithms.
Accordingly, the topology of Storm consists of two main components: the Spouts to read data coming from an external source and emit them
into the topology, and Bolts to ensure the processing tasks (ie, each bolt can do anything from filtering, functions, aggregations, etc). Samoa
contains entrance processing items and processing items. In fact, the integration of Storm components to Samoa allows establishing relationships
between Storm classes and the Samoa components. Furthermore, Samoa allows the implementation of distributed machine learning on streams.
For instance, Samoa includes the Vertical Hoeffding Tree (VHT), a distributed streaming version of a decision tree, for the classification and
Horizontal Adaptive Model Rules Regressor (HAMR) for regression.
In parallel, as illustrated in Figure 6, the prequential evaluation task has been used in this work. It uses each instance coming from sensing
nodes first to test and then train the model. In fact, it consists of three processing units, the source processor, the classifier, and the evaluator
processor. The source processor reads data streams using apache Kafka* (a publish/subscribe messaging system) and sends them to the classifier.
This later sends the results to the evaluator processor, which is designed to evaluate the performance of the classifier by supporting the basic
classification performance evaluators. It mainly measures the accuracy of the classifier model since the beginning of the evaluation.

* https://kafka.apache.org/
15320634, 2020, 17, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5651 by Selcuk Universitesi, Wiley Online Library on [06/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 of 13 ELKHOUKHI ET AL.

FIGURE 5 Classification of machine learning


techniques

FIGURE 6 Vertical Hoeffding Tree classifier using prequential evaluation


task

In this work, we put more emphasis on VHT algorithm,30 which is a parallelizing streaming decision tree designed for distributed streaming
machine learning. In fact, the parallelism approaches can be divided into two main types: horizontal and vertical. The horizontal parallelism
distributes the arriving instances to others PI called local statistics PI, based on horizontal data partitioning. For example, if there are 5
local-statistics PIs and 100 arriving instances, then each of them receives 20 instances that case not effective for incremental decision tree
algorithm. However, vertical parallelism distributes the instances by their attributes, eg, if each instance has 100 attributes and there are 5
local-statistic PIs, then each PI receives 20 attributes from each instance.
As depicted in Figure 6, VHT algorithm operates as follows: when the source processor sends data streams to the classifier, the first component
that receives them is the model aggregator. This later consists of the decision tree model. It connects to the local-statistic PI through attribute
stream and control stream. The model-aggregator PI splits instances based on attribute and each local-statistic PI contains local statistic for
attributes. Model-aggregator PI sends the split instances trough attribute stream and it sends control messages to ask local-statistic PI to perform
computation via control stream. As depicted in Figure 6, there are n local-statistic PI (ie, parallelism level to be configured by the user) that could
be used for vertical parallelism of the VHT algorithm. The computation result from each local-statistic PI is sent back to the model-aggregator PI,
which, in turn, sends the classification result trough result stream to the evaluator PI. This later performs the evaluation of the algorithm in terms
of accuracy.

4 EXPERIMENTS AND EVALUATION RESULTS

The main purpose of this study is to analyze and explore the usefulness of real-time machine learning for the occupants' presence prediction in
smart buildings. A platform prototype was deployed in our EEBLab to show its operational modes as well as the performance and the accuracy of
real-time machine learning for occupancy detection. The usefulness of the platform is illustrated using an existing dataset, and its effectiveness
is shown when using actual sensors data streams. Real scenario is deployed, and the obtained results are described in the next section.
15320634, 2020, 17, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5651 by Selcuk Universitesi, Wiley Online Library on [06/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ELKHOUKHI ET AL. 7 of 13

4.1 Experiments using dataset


In this section, we used first an occupancy detection dataset from UCI repository.15,31 It includes indoor sensors data, mainly light, temperature,
humidity, and CO2 values, which can be used to predict the status of occupants' presence (ie, 0 for not occupied, 1 for occupied). We have
compared, using this dataset, both offline and online machine learning techniques for occupants' presence detection. In fact, offline machine
learning or batch learning is a technique that generates at once the best predictor by learning on the entire training data, while online machine
learning updates the best predictor at each time future data streams arrives. For offline machine learning, LDA is used, which showed better
accuracy in the training and the test sets compared to RF and CART. In fact, LDA is an algorithm for classification predictive modeling problems
to find a linear combination of the features that characterize or separate two or more classes of objects or events. Regarding online machine
learning, the VHT algorithm, which showed its potential for analyzing data streams is integrated.16,29
Offline machine learning is used first to figure out its potential in detecting the presence of occupants. As shown in Figure 7, this algorithm
has the ability to estimate the occupancy with high accuracy reaching 98.76%. However, the LDA is used for batch analysis. In fact, it needs
samples of cleaned sensors data together with a trained model (ie, either manually or automatically using additional sensors) in order to be used
for context detection and prediction. More precisely, offline machine learning is almost used by the community to study the accuracy of machine
learning algorithms (eg, SVM, KNN). They are not directly applicable for real-time processing of streaming data in order to figure out new insight
that could be integrated to select accordingly the best suitable action (eg, HVAC, ventilation control). Online occupancy detection using machine
learning is a potential technique that could be used in this context.
Experiments have been conducted in order to show the usefulness of online machine learning, by using the same data set but, in this case,
for the VHT algorithm. As a preliminary test, we choose to read data from dataset as data streams with a fixed frequency (eg, 1, 10, and 50
samples). These data streams are then transmitted to the platform, which executes the VHT Classifier. The results depicted in Figure 8 show the
potential of online machine learning, which detects accurately the occupants' presence. It is worth noting that similar experiment is conducted
with several data streaming frequencies without including them since they show same behavior. Moreover, this result is confirmed in Figure 9,
which depicts the classification correct of the prediction. However, the results show a sudden decrease of classification correct for the first 10
instances and improved quickly to reach 95%. These results show the efficiency of online occupancy detection when integrated with IoT and Big
Data technologies into a holistic platform.

FIGURE 7 Offline occupancy detection


using LDA

FIGURE 8 Online occupancy detection


using VHT

FIGURE 9 The classification corrects


15320634, 2020, 17, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5651 by Selcuk Universitesi, Wiley Online Library on [06/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 of 13 ELKHOUKHI ET AL.

These experiments were first conducted to study and analyze the effectiveness of the platform for data processing using real-time machine
learning for occupancy presence detection. The rest of this section presents a real sitting scenario using the equipment that are deployed in our
EEBLab. The aim was to deploy a real scenario that could be used to develop context-aware applications for smart HVAC, lighting, and ventilation
systems.

4.2 Experiments using sensors data streams


This section focuses on a real-sitting scenario using data streams from actual sensors. EEBLab is a laboratory, which contains different scenarios
regarding energy efficiency in buildings (eg, HVAC and ventilation control, thermal monitoring, renewable energy production monitoring, and
weather monitoring). In the EEBLab, occupancy is always variable according to students working time, and therefore, it constitutes a great
environment (ie, dynamic occupancy) for conducting experiments for real-time occupancy detection. It is worth noting that, as aforementioned,
occupancy information (eg, presence, number, and activities) could be used for controlling window opening and shading, HVAC, and lighting. The
aim is to decrease energy consumption while maintaining visual, air quality and thermal comfort of occupants in buildings.
In this section, the focus is on occupancy presence detection. The main focus is to show the effectiveness of predicting the presence in
buildings based on CO2 , temperature, and lighting power consumption using real-time machine learning algorithm. An experiment has been
conducted on Thursday 3rd May 2018 between 9:00 AM and 8:30 PM. During this day, first students start working on the laboratory at 9:45 AM
and all students left at 7:00 PM, as shown in Figure 10B. Moreover, there are few breaks during the day (between 10:45 AM and 11:20 AM,
12:45 AM and 1:45 PM, and around 4:45 PM as well). These figures show that the predicted values of occupants' presence are slightly in line with
ground truth values, except around 10:45 AM and 7:30 PM during which someone entered and leaved shortly the EEBLab as shown in Figure 10B,
but it was not showed in the predicted curve. Moreover, the prediction curve shows that the laboratory is occupied after 2:00 PM, but the true
value showed that it was occupied before 2:00 PM. For the deployed applications, this delay will not affect the control decision and will not have
impact on occupants' comfort as well as the energy consumption.
It is worth noting that the ground truth values were obtained from infrared sensors, which are fixed in the door entry. In order to better
analyze these results and show the correlation between the presence and other environmental parameters, we have measured the internal CO2,
the temperature, the lighting power consumption values, as well as the number of occupants. As shown in Figure 11D, the EEBLab was empty
until 9:45 AM, ie, there was nobody working before that time. In parallel, other parameters as CO2 , temperature and light consumption are
almost constant with a slight variation, except for the number of occupants, which is always 0, as expected. After this time, only one occupant
entered and spent almost 1 hour and 30 minutes (ie, from 9:45 AM until 10:45 AM) inside as also mentioned in Figure 11D. This occupant left for
30 minutes and went back (ie, between 10:40 AM until 11:20 AM), but we can see that, until 12:45 AM, the number of occupants is either zero,
one, or two occupants. During this period, a slight increase in temperature, CO2 , and lighting power consumption was detected as illustrated in
Figures 11A, 11B, and 11C.
For instance, between 9:45 AM and 12:40 AM, the CO2 values slightly increased, between 800 ppm and 1000 ppm, to become stable during the
lunch period (ie, between 12:40 AM and 1:50 PM). On the other hand, the temperature increased from 20◦ C into 26◦ C, while power consumption
values were around 50 W before 9:45 AM and varied around 90 W from 9:45 AM to 11:00 AM and from 11:15 AM to 12:15 AM, except during some
periods the consumption was around 50 W when the occupants left EEBLab for breaks. We can also see that, during lunch time, all the values
from these different sensors are slightly constant. As well as, there is a little decrease of CO2 value during time periods 10:45 AM to 11:20 AM
and 12:40 AM to 1:50 PM. Moreover, after 1:50 PM all, the team's members went to the EEBLab for working and conducting experiments. Each
student has his own task, inside or outside or both according to the task to be performed. The number of occupants varies between 2 and 5
during 1:45 PM-4:30 PM time period. In this period, almost the values of CO2 , the temperature, and the power consumption increase, eg, the CO2

FIGURE 10 Occupancy
detection: (A) predicted values
and (B) ground truth value
15320634, 2020, 17, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5651 by Selcuk Universitesi, Wiley Online Library on [06/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ELKHOUKHI ET AL. 9 of 13

FIGURE 11 Sensors data


streams: (A) Temperature;
(B) CO2 ; (C) Lighting power
consumption; (D) Accurate
number of occupants

values varies between 1000 ppm and 1300 ppm, the temperature values are between 25◦ C to 27◦ C, while the power consumption values are
around 200 W.
In fact, the number of occupants decreases around 3:50 PM until one occupant. In parallel, there was a little decrease of CO2 , temperature,
and light consumption. Further, the period between 4:40 PM and 5:00 PM, the laboratory was empty. Hence, the CO2 values decreased until
800 ppm, the temperature reach 23◦ C, the light power consumption also decreased until 70 W. After this period, between 5:00 PM and 7:00 PM,
some members went back as also shown in Figure 11D, their number varied between 2 and 7. Therefore, the CO2 values showed a slight
increase (Figure 11B), the temperature values are increased from 24◦ C to 26◦ C (Figure 11A), while the power consumption reached 100 W
(Figure 11C). As also shown in Figure 10, the predicted and ground truth values are indicating the presence inside the laboratory. During this
period, there are epochs in which the numbers of occupants decrease (eg, three occupants are present between 5:50 PM and 18:10 PM). In parallel,
a decrease of CO2 and temperature showed in Figures 11A and 11B. However, after 7:00 PM, the occupants leaved the EEBLab as indicated in
Figure 10, there are no occupants, except around 7:20 PM, which showed a quick occupation. During this period, CO2 , temperature, and light
power consumption values showed a high decrease. It is worthy to mention that the predicted value has a little delay in predicting the presence
of occupants.

5 APPLICATION TO ENERGY EFFICIENCY

Many studies, as stated above, have shown that comprehensive fine-grained occupancy information could be integrated in order to improve the
performance of occupancy-driven control of HVAC, lighting, and ventilation systems in smart buildings. In the ongoing work, several scenarios
are deployed in the EEBLab for real testing.5,32-35 The main goals are (i) demonstrate how ICT can contribute to reduce energy consumption, (ii)
study the impact of occupancy on energy use in buildings, (iii) develop intelligent control approaches for efficient match of fluctuating power
generation with buildings charges (demand/response). Mainly, we propose to use contextual data (eg, indoor/outdoor CO2 concentration), for
context-driven monitoring and control of buildings services, such as HVAC and ventilation systems.
The work presented in this section focuses mainly on the influence of occupants' presence on reducing power consumption of ventilation
systems.5 The prototype architecture for testing is depicted in Figure 12. It shows mainly the occupants' presence feedback is used for controlling
15320634, 2020, 17, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5651 by Selcuk Universitesi, Wiley Online Library on [06/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 of 13 ELKHOUKHI ET AL.

FIGURE 12 The deployed control flow architecture of the ventilation system

the EEBLab ventilation system. This figure shows the feedback architecture for controlling the ventilation system based on occupants' presence
prediction. It consists of the real-time occupancy prediction and indoor building control service. The real-time occupancy prediction consists of
occupancy detection services (eg, presence), which allow predicting the presence of occupants. The indoor building control has been deployed for
adjusting the ventilation rates. For instance, the current work is to explore the ON/OFF approaches based on predicted value (ie, if the EEBLab
was not occupied, this equipment receives a message from the platform in order to turn the ventilation OFF).
For testing purpose, the aim is to detect accurately the occupancy based on CO2 level, temperature, and light consumption using real-time
machine learning algorithm. In this work, two scenarios have been considered. In the first one, the presence of occupants is not included (ie,
without estimated occupants' presence), and the control strategy operates continuously to maintain the indoor CO2 concentration at the comfort
set point. In the second scenario, the presence of occupants is included into the control strategy (ie, with estimated occupants' presence). In
fact, when the presence of occupants is detected, the control system switches ON the ventilation system. The system operates similarly like in
the first scenario; otherwise, it is switched OFF when nobody is detected. Basically, as shown in Figure 12, the occupancy application reads the
predicted value generated from the platform, to detect either 1 (occupied) or 0 (not occupied). If the EEBLab is not occupied, the control strategy
checks the ventilation system status (ie, ON or OFF). The control flow works as follows: when detecting that nobody is present in the EEBLab,
the platform sends a MQTT message, which contains the predicted value (ie, 1 occupied or 0 not occupied) through the broker to the Raspberry
pi in order to turn OFF the ventilation system.
The main aim of this experiment is to show the influence of occupants' presence on power consumption reduction for ventilation system control.
A service, already developed in the previous platform prototype was included, for real-time measuring and visualizing power consumption.32
Figure 13 presents the correlation between occupancy presence values and the power consumption of the ventilation system during a period of
the day. Experiments were taken place on 17th May starting from 9:25 AM to 4:00 PM. As shown in Figure 13A, the EEBLab was empty until
9:40 AM. After that time, the laboratory was occupied during 1 hour and 45 minutes. This result matches well the number of occupants, which

FIGURE 13 (A) True/estimated values;


(B) Number of occupants
15320634, 2020, 17, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5651 by Selcuk Universitesi, Wiley Online Library on [06/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ELKHOUKHI ET AL. 11 of 13

FIGURE 14 Power consumption of the


ventilation system with and without
occupants' presence integration

are present at that time (eg, 1 to 4 occupants). Moreover, the EEBLab was not occupied between 11:20 AM and 12:40 PM, as also shown in
Figures 13A and 13B. After lunch time, the EEBLab is again occupied, from 12:40 PM until 2:15 PM, and the number of occupants was varying
between 1 and 3. After 2:15 PM, all occupants leaved the laboratory, as shown in Figures 13A and 13B, the true and predicted values equal 0
(ie, not occupied). As shown in this figure, the predicted value shows a slight delay in predicting the occupancy (ie, blue curve), because of the
communication and remote data processing. But, these results show that the platform is able to detect occupants' presence with high accuracy
(ie, around 80% in average).
As described above, the aim is to show the effectiveness in using real-time occupants' presence in terms of power consumption reduction.
Figure 14 shows the comparison between the power consumption of the ventilation system with and without including occupancy scenario.
The orange curve presents the power consumption from 9:25 AM to 4:00 PM. It is generated from experiments, which were taken place on 17th
April, but without including the presence of occupants into the ventilation control. In fact, the blue curve shows the power consumption when
taking into consideration the occupants' presence in the ventilation control. The power consumption of the ventilation system was around 20 W
but during the presence period. These results show almost 62% reductions on power consumption when using occupants' presence, since the
ventilation system was operating only during two periods, ie, when the EEBLab was occupied.

6 CONCLUSIONS AND FUTURE WORK

This paper presents a first prototype platform toward real-time machine learning integration with recent IoT and data stream processing
technologies for building's energy management. The platform prototype was deployed in the EEBLab for occupants' presence detection and its
integration within buildings services, such as ventilation and HVAC control. We have first conducted experiments using both an existing dataset
as well as data streams from actual deployed sensors. Experimental results showed the potential and accuracy of real-time machine learning for
occupancy detection in buildings. Furthermore, the integration of occupants' presence for controlling the ventilation system showed a high-power
reduction compared to a traditional control approach, which uses schedule and time triggered control. More experiments will be conducted to
shed more light on the usefulness of online machine learning in real setting using the developed context-driven approaches for lighting, shading,
and HVAC control. In fact, the platform will be used to develop other prediction and forecasting algorithms. For instance, the occupants' number
prediction is an ongoing work aiming at forecasting the CO2 concentration, which will be used to design an advanced ventilation controller that
makes prediction about its future behavior and then determine the optimal control actions.29 Furthermore, the platform will be also used to
develop other buildings services, eg, RES power production forecasting and predictive control.

ACKNOWLEDGMENTS

This work is supported by MIGRID project (grant 5-398, 2017-2019), which is funded by USAID under the PEER program and partially supported
by CASANET project (2016-2018), which is funded by ‘‘le Ministere de l'Enseignement Superieur, de la Recherche Scientifique et de la Formation
des Cadres (MESRSFC)’’ and ‘‘le Centre National pour la Recherche Scientifique et Technique (CNRST)’’.

ORCID

Youssef NaitMalek https://orcid.org/0000-0001-9378-9984


Mohamed Bakhouya https://orcid.org/0000-0001-8558-5471

REFERENCES
1. Nguyen T-A, Aiello M. Energy intelligent buildings based on user activity: a survey. Energy Build. 2013;56:244-257.
2. Akbar A, Nati M, Carrez F, Moessner K. Contextual occupancy detection for smart office by pattern recognition of electricity consumption data. Paper
presented at: 2015 IEEE International Conference on Communications (ICC); 2015; London, UK.
15320634, 2020, 17, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5651 by Selcuk Universitesi, Wiley Online Library on [06/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 of 13 ELKHOUKHI ET AL.

3. Shih H-C. A robust occupancy detection and tracking algorithm for the automatic monitoring and commissioning of a building. Energy Build.
2014;77:270-280.
4. Lachhab F, Bakhouya M, Ouladsine R, Essaaidi M. Performance evaluation of linked stream data processing engines for situational awareness
applications. Concurrency Computat Pract Exper. 2018;30(12):e4380. https://doi.org/10.1002/cpe.4380
5. Lachhab F, Bakhouya M, Ouladsine R, Essaaidi M. Monitoring and controlling buildings indoor air quality using WSN-based technologies. Paper
presented at: 2017 4th International Conference on Control, Decision and Information Technologies (CoDIT); 2017; Barcelona, Spain.
6. Calì D, Matthes P, Huchtemann K, Streblow R, Müller D. CO2 based occupancy detection algorithm: experimental analysis and validation for office
and residential buildings. Build Environ. 2015;86:39-49.
7. Yang Z, Li N, Becerik-Gerber B, Orosz M. A multi-sensor based occupancy estimation model for supporting demand driven HVAC operations.
In: Proceedings of the 2012 Symposium on Simulation for Architecture and Urban Design, Society for Computer Simulation International; 2012; San
Diego, CA.
8. Kleiminger W, Beckel C, Staake T, Santini S. Occupancy detection from electricity consumption data. In: Proceedings of the 5th ACM Workshop on
Embedded Systems For Energy-Efficient Buildings; 2013; Roma, Italy. https://doi.org/10.1145/2528282.2528295
9. Akkaya K, Guvenc I, Aygun R, Pala N, Kadri A. IoT-based occupancy monitoring techniques for energy-efficient smart buildings. Paper presented at:
2015 IEEE Wireless Communications and Networking Conference Workshops (WCNCW); 2015; New Orleans, LA.
10. Wang SW, Burnett J, Chong H. Experimental validation of CO2 -based occupancy detection for demand-controlled ventilation. Indoor Built Environ.
1999;8(6):377-391.
11. Oldewurtel F, Sturzenegger D, Morari M. Importance of occupancy information for building climate control. Applied Energy. 2013;101:521-532.
12. de Bakker C, Aries M, Kort H, Rosemann A. Occupancy-based lighting control in open-plan office spaces: a state-of-the-art review. Build Environ.
2017;112:308-321.
13. Lachhab F, Bakhouya M, Ouladsine R, Essaaidi M. Towards a context-aware platform for complex and stream event processing. Paper presented at:
2016 International Conference on High Performance Computing and Simulation (HPCS); 2016; Innsbruck, Austria.
14. Stolpe M. The internet of things: opportunities and challenges for distributed data analysis. ACM SIGKDD Explor Newsl. 2016;18(1):15-34.
15. Malek YN, Kharbouch A, El Khoukhi H, et al. On the use of IoT and big data technologies for real-time monitoring and data processing. In: Proceedings
of the 7th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH2017); Lund,
Sweden.
16. Morales GDF, Bifet A. SAMOA: scalable advanced massive online analysis. J Mach Learn Res. 2015;16:149-153.
17. Pan J, Jain R, Paul S, Vu T, Saifullah A, Sha M. An internet of things framework for smart energy in buildings: designs, prototype, and experiments.
IEEE Internet Things J. 2015;2(6):527-537.
18. Marche C, Nitti M, Pilloni V. Energy efficiency in smart building: a comfort aware approach based on social internet of things. Paper presented at:
2017 Global Internet of Things Summit (GIoTS); 2017; Geneva, Switzerland.
19. Candanedo LM, Feldheim V. Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using
statistical learning models. Energy Build. 2016;112:28-39.
20. Tutuncu K, Cataltas O, Koklu M. Occupancy detection through light, temperature, humidity and CO2 sensors using ANN. In: Proceedings of ISER 45th
International Conference; 2016; Rabat, Morocco.
21. Khan A, Nicholson J, Mellor S, et al. Occupancy monitoring using environmental & context sensors and a hierarchical analysis framework.
In: Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings; 2014; Memphis, TN. https://doi.org/10.1145/
2674061.2674080
22. Yang Z, Li N, Becerik-Gerber B, Orosz M. A systematic approach to occupancy modeling in ambient sensor-rich buildings. Simulation.
2014;90(8):960-977.
23. Vafeiadis T, Stavropoulos G, Ioannidis D, et al. Machine learning based occupancy detection via the use of smart meters. Paper presented at: 2017
International Symposium on Computer Science and Intelligent Controls (ISCSIC); 2017; Budapest, Hungary.
24. Chen D, Kalra S, Irwin D, Shenoy P, Albrecht J. Preventing occupancy detection from smart meters. IEEE Trans Smart Grid. 2015;6(5):2426-2434.
25. Ebadat A, Bottegal G, Varagnolo D, Wahlberg B, Johansson KH. Estimation of building occupancy levels through environmental signals deconvolution.
In: Proceedings of the 5th ACM Workshop on Embedded Systems For Energy-Efficient Buildings; 2013; Rome, Italy.
26. Dong B, Andrews B, Lam KP, et al. An information technology enabled sustainability test-bed (ITEST) for occupancy detection through an environmental
sensing network. Energy Build. 2010;42(7):1038-1046.
27. Jin M, Bekiaris-Liberis N, Weekly K, Spanos CJ, Bayen AM. Occupancy detection via environmental sensing. IEEE Trans Autom Sci Eng.
2016;(99):1-13.
28. Bifet A, Maniu S, Qian J, Tian G, He C, Fan W. StreamDM: advanced data mining in spark streaming. Paper presented at: 2015 IEEE International
Conference on Data Mining Workshop (ICDMW); 2015; Atlantic City, NJ. https://doi.org/10.1109/ICDMW.2015.140
29. Berouine A, Ouladsine R, Bakhouya M, Lachhab F, Essaaidi M. A model predictive strategy for ventilation system control in energy efficient buildings.
Paper presented at: 2019 4th World Conference on Complex Systems (WCCS); 2019; Ouarzazate, Morocco.
30. Kourtellis N, De Francisci Morales G, Bifet A, Murdopo A. VHT: vertical hoeffding tree. Paper presented at:2016 IEEE International Conference on
Big Data (Big Data); 2016; Washington, DC.
31. Candanedo L. University of California at Irvine Repository of Machine Learning Databases. https://archive.ics.uci.edu/ml/datasets/Occupancy+
Detection+. Accessed 2016.
32. Berouine A, Lachhab F, NaitMalek Y, Bakhouya M, Ouladsine R. A smart metering platform using big data and IoT technologies. Paper presented at:
2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech); 2017; Rabat, Morocco.
33. Lachhab F, Bakhouya M, Ouledsine R, Essaaidi M. Energy-efficient buildings as complex socio-technical systems: approaches and challenges.
In: Advances in Complex Societal, Environmental and Engineered Systems. Cham, Switzerland: Springer; 2016:247-265. Nonlinear Systems and Complexity;
vol. 18. https://doi.org/10.1007/978-3-319-46164-9_12
15320634, 2020, 17, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5651 by Selcuk Universitesi, Wiley Online Library on [06/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ELKHOUKHI ET AL. 13 of 13

34. Bakhouya M, NaitMalek Y, Elmouatamid A, et al. Towards a data-driven platform using IoT and big data technologies for energy efficient buildings.
Paper presented at: 2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech)2017; Rabat, Morocco.
35. Lachhab F, Bakhouya M, Ouladsine R, Essaaidi M. Context-driven monitoring and control of buildings ventilation systems using big data and Internet
of Things–based technologies. J Syst Control Eng. https://doi.org/10.1177/0959651818791406

How to cite this article: Elkhoukhi H, NaitMalek Y, Bakhouya M, et al. A platform architecture for occupancy detection using stream
processing and machine learning approaches. Concurrency Computat Pract Exper. 2020;32:e5651. https://doi.org/10.1002/cpe.5651

You might also like