Data Driven Occupancy Information For Energy Simulation

Energy 218 (2021) 119539
Contents lists available at ScienceDirect
Energy
journal homepage: www.elsevier.com/locate/energy
Data driven occupancy information for energy simulation and energy

use assessment in residential buildings
Karthik Panchabikesan a, Fariborz Haghighat a, *, Mohamed El Mankibi b
a
Energy and Environment Group, Department of Building, Civil and Environmental Engineering, Concordia University, Montreal, H3G 1M8, Canada
b
Building and Civil Engineering Laboratory (LGCB), Ecole Nationale des Travaux Publics de L’Etat (ENTPE) Vaulx-en-Velin, Lyon, France
a r t i c l e i n f o a b s t r a c t
Article history: Occupant’s schedules and their energy-use behavior are substantial inputs for building energy simula-
Received 20 May 2020 tions and energy management in buildings. In practice, most of the research studies consider default
Received in revised form occupant schedules from the standards. Subsequently, the temporal variations associated with occu-
23 October 2020
pancy is often missed out, leading to uncertainties in simulation results. This study aims to address two
Accepted 4 December 2020
Available online 13 December 2020
research problems in terms of occupancy: 1) upon the availability of the data, how to systematically
extract the different occupant schedules, 2) when the occupancy data is not available, what are the other
commonly logged parameters (such as plug load, lighting energy consumption, indoor carbon dioxide
Keywords:
Knowledge discovery
(CO2) concentration, and indoor relative humidity data) that shall be used to represent the occupancy in
Residential buildings buildings. Regarding the first objective, a generic data-driven framework with the combination of shape-
Occupant activity schedules based clustering and change-point detection method is proposed to extract the distinct occupancy in
Presence probability residential buildings in terms of occupant activity schedule and presence probability. To demonstrate the
Statistical models outcomes of the framework, it was applied to the dataset collected from eight apartments located in
Lyon, France. The results show the existence of different occupant patterns in buildings with respect to
day of the week and season of the year. To achieve the second objective, linear and logistic regression
models were developed to represent the occupant activity level and occupant presence/absence state,
respectively. The linear regression model results show that among the examined variables, the lighting,
and plug load consumption data along with the hour of the day show better prediction results in terms of
adjusted R2 and mean absolute percentage error. For the occupant presence/absence state, the logistic
regression model developed using CO2 concentration and plug load energy consumption dataset shows
better results in misclassification error, confusion matrix, and receiver operating characteristic curve.
© 2020 Elsevier Ltd. All rights reserved.
1. Introduction energy simulation software, the heating, ventilation and air con-
ditioning loads are estimated based on the temperature setpoint
1.1. Research background schedules and internal gains. Thus, it is highly recommended to
define realistic schedules related to occupancy, setpoints and
The occupant related inputs such as occupant schedule, tem- equipment usage. Knowing the importance of occupant behavior in
perature setpoint, and equipment usage profile substantially in- evaluating the building energy performance, especially at the urban
fluence building energy simulation results and energy demand scale, International Energy Agency have Annexes 53 [2], 66 [3] and
predictions [1]. In residential buildings, use of electrical appliances, 79 focusing on studying the influence of occupants in building
thermostat setpoint temperatures are highly dependent on the energy use, simulation methods of occupant behaviour and
occupant’s energy use behaviour. Hence, the schedules of occu- addressing the research issues related to occupant-centric building
pants, plug power, lighting usage and thermostat setpoint tem- design and operation, respectively. This emphasizes the research
perature are strongly correlated with each other. In most of the potential and need for understanding, modelling the occupancy in
building performance simulation. Buttitta et al. [4] insisted that the
occupant presence/absence state and activity schedule should be
considered in building energy simulations, as both have influence
* Corresponding author.
E-mail address: Fariborz.Haghighat@concordia.ca (F. Haghighat).
in defining occupant schedules and estimating internal heat gains
https://doi.org/10.1016/j.energy.2020.119539
0360-5442/© 2020 Elsevier Ltd. All rights reserved.
K. Panchabikesan, F. Haghighat and M.E. Mankibi Energy 218 (2021) 119539
et al. [17] generated the yearly occupant profiles for a group of flats
Abbreviations using the electric load profiles to represent internal heat gains in
the energy simulations. Initially, the self-organizing map algorithm,
AUROC Area under the receiver operating characteristic k-means clustering was used for daily pattern recognition, and
BEMS Building energy management system later, k-nearest neighbors were used for generating the yearly
CO2 carbon dioxide occupant profiles. Mitra et al. [19] extracted the occupancy profiles
CPD Change point detection for typical residential buildings in the United States using the
DM Data mining American Time Use Survey data. They reported that there is a 41%
FPR False positive rate variation between the developed schedules and currently used
MAPE Mean absolute percentage error residential schedules in energy simulations in specific cases. Razavi
MC error Misclassification error et al. [20] developed a dynamic genetic programming-based
ppm Parts per million feature engineering process to detect the occupancy in residential
RH Relative humidity buildings using the energy consumption data. The results show that
ROC Receiver operating characteristic the real-time smart metering data can supremely detect the
SBD Shape based distance household’s presence or absence state with the area under the
receiver operating characteristic (AUROC) of 0.98. Zhou et al. [21]
developed the multi-grained cascade forest model to estimate the
indoor occupancy using the CO2 sensor data. The wavelet denoising
due to occupants. In general, the occupant schedule is highly un- method was used to remove the noise in the CO2 data. It is reported
certain [5]. Thus, in most building energy simulations, the default that the developed model outperformed the other models (classi-
schedule (based on the building codes, national averages) is fication and regression trees, support vector machine, inhomoge-
considered irrespective of whether occupancy in a building would neous hidden Markov model) in terms of accuracy, capturing the
change based on the day of the week and season [6]. This often first arrival and last departure time.
leads to uncertainties, resulting in a significant difference between Recently, new data sources such as Wi-Fi signals, mobile global
the simulation results and the operational data [1]. The main reason positioning system signals [22], location-based services data [7],
for considering the default/standardized schedules is the data non- and thermostat data [23] are considered to analyze the building
availability, and in most cases, the actual occupant behavior is occupancy. For instance, Happle et al. [7] proposed a workflow to
unknown [7]. Recently, there is a wide implementation of smart generate representative occupant schedules from the location-
energy meters in residential and office buildings, resulting in based data (such as Google maps and Facebook) collected in retail
massive data collection related to occupancy, energy, and indoor shops and restaurants from 13 cities in the United States. Their
environmental condition. One example of such a comprehensive results showed variation in occupancy between different days of
data source is the ’Donate Your Data’ initiated by the thermostat the week, and distinct occupancy profiles should be considered for
manufacturer ecobee Inc [8]. In this context, upon the availability of weekdays, Saturday, and Sunday. Huchuk et al. [23] compared
occupant related data, it is vital to develop a generic data-driven various machine learning models to predict the future occupant
framework that collectively analyses and extracts distinct presence/absence state in residential buildings using the thermo-
occupant-related information from the data collected in residential stat data. The authors reported that random forest model, logistic
buildings. regression and Markov model showed better prediction perfor-
On the other hand, occupant role in realizing energy-saving mance (because of their robustness to a range of conditions,
activities is recognized as a cost-effective measure compared to computational efficiency in terms of training and prediction)
other conventional energy savings actions that usually require in- compared to hidden Markov models and recurring neural
vestments [9]. Prior studies [10,11,33] indicate that 5e25% energy networks.
savings in residential buildings are possible by providing; 1)
energy-related feedback to occupants [12], and 2) insights about 1.3. Studies related to occupant related energy behavior analysis in
their standings on energy use behavior compared to other similar buildings
buildings [13]. Though some studies [14] focused on emphasizing
the occupant’s role in achieving energy savings, they did not pro- International Energy Agency Annex 53 [2] identified occupant
vide clear direction for the occupants to make informed decisions behavior as one of the six factors influencing building energy usage.
on how, when, and what they should do to improve energy effi- Occupants play a significant role in the energy performance of even
ciency. The main reason for the above said is the lack of under- highly energy-efficient buildings. Hence, it is essential not only to
standing of occupant behavior [15]. Most of the literature studies model occupancy in buildings but also to quantify occupants’
have not considered the diversity associated with the occupant impact on building energy use. Nord et al. [24] analyzed occupant
schedule and level of activity while recommending energy-saving behavior’s influence on heating and cooling demand, indoor air
measures [16]. In this regard, systematic analysis of the data quality, and electricity grid interaction of a zero-emission building.
collected from building energy management system (BEMS), smart Their results emphasized the need for the detailed occupant
metering systems is beneficial for generating reliable occupant behavior analysis for the appropriate design of energy systems and
related input data for building energy models [17] and energy smart energy management in buildings during peak hours. Yoshino
performance analysis. et al. [2] reported that in the past, most of the research works were
focused on understanding the influence of the physical factors
1.2. Studies related to occupant pattern extraction and prediction (weather, building envelope, and equipment), and more studies
must be performed on studying the influence of human factors
D’Oca et al. [18], in their data mining based study, considered (occupant behavior, operation & maintenance). In recent years, the
the time of the day, day of the week, season, window change ac- data mining (DM) and data-driven tools are well-acknowledged for
tivities as the predictor attributes to predict the occupant presence pattern recognition [18,25], for exploring the role of occupant
in the office buildings. The developed decision tree model predicted behaviour in achieving energy savings in residential [12,26] and
the occupant presence state with an accuracy of 90.53%. Causone institutional buildings [27], occupancy prediction [18,28] and
2
energy demand modelling [29]. Yu et al. [25] used the end-use load 1.5. Objectives and uniqueness of the study
data to explore occupant behavior’s influence on building energy
use. The authors reported the possibilities of energy savings po- The present work aims to; 1) develop a data-driven framework
tential by modifying the occupant’s energy-inefficient activities (s). that systematically explores occupant activity schedule, presence
The same authors [30] used the combination of DM tools (clus- probability in residential buildings and to provide insights on their
tering analysis, decision tree, and associate rule mining) to identify inherent temporal variations, 2) statistically determine the signif-
the energy-inefficient occupant behavior and, accordingly, sug- icance of other commonly logged parameters on representing the
gested modifications to improve the occupant behavior in resi- occupant schedule and occupant presence probability in residential
dential buildings. Ashouri et al. [14] with a similar objective as buildings (in the absence of such data). The uniqueness of the
[25,30] used clustering analysis, associate rule mining to identify proposed framework is that it could be used to extract occupant
the energy-inefficient occupant behavior. Rules for energy savings, schedules from ’n’ number of buildings, and the output of the
behavior modifications were extracted, and an artificial neural framework can be used to define the distinct hourly occupant
network was used to quantify the possible energy savings in resi- schedules while performing building energy simulation platform
dential buildings through the change of occupant behavior. and energy use assessment. Since change point analysis is used, the
Though the studies mentioned above supremely emphasized specific hour(s) of the day at which the occupancy changes majorly
the applicability of DM tools in exploring the impact of occupant can be determined. Such details could be used as important infor-
behavior on residential buildings’ energy use, the major limitation mation during energy simulation to define the hourly occupant
is the non-consideration of the actual occupant-related information schedules. Besides, the outcomes of the regression analysis will be
because of the data non-availability. Since the occupant activity useful to get insights on the statistical significance of the other
schedule, presence state in one building might differ from the commonly logged parameters on representing the occupancy in
others or even differ for the same building for different days, the residential buildings, when such data are not available.
actual reasons for the difference in energy consumption between
the buildings are unknown. Hence, the estimated energy saving 1.6. Organization of the paper
potential may possess uncertainty. Therefore, researchers [14,30]
reported that the non-availability of occupants related data is a The paper is organized into six sections, including the intro-
limitation and inclusion of occupant activity schedule, presence duction and conclusion. In Section 2, the step-by-step methodology
state, age, the profession would further increase the accuracy of the is explained in two sub sections (Section 2.1 and 2.2) considering
DM frameworks in recommending more reliable energy-saving the study’s objectives. Section 3 describes the dataset used, and the
measures in line with the occupant preferences. data preprocessing steps followed to reach the objectives. In Sec-
tion 4, the results obtained concerning objectives 1 and 2 are pre-
sented, respectively. Section 5 clarifies the paper’s main outcomes
1.4. Inference from the literature and problem statement and findings and clarifies the contribution of the paper. Finally, the
conclusion of the study is summarized in Section 6.
Most of the urban scale building energy modeling considers
default/standard occupant schedules (for all building archetypes) 2. Methodology
available in the energy simulation software tools, which is not true
in real case scenario. It is important to extract occupant patterns at 2.1. Data-driven framework to extract distinct occupant patterns
a city/district level beyond individual buildings because it is and presence probability
essential for decision making and implementation of energy pol-
icies at an urban scale. Though in recent times, data-driven Fig. 1 shows the step by step procedure involved in the proposed
methods and probabilistic models were developed to represent data-driven framework to extract temporal based occupant
occupancy levels in buildings, most studies focused on office schedules and occupant presence probability, respectively.
[18,31] and institutional buildings [27,32] where routineness in the
occupant schedules, presence probability is highly possible because 2.1.1. Step 1: data preparation
of the scheduled timing (e.g., 09:00 he17:00 h). While in residen- In knowledge discovery in databases, an essential step to be
tial buildings, such a routine profile is not expected, and the performed before the data analysis is processing the raw dataset
occupant pattern varies based on several factors. On the other hand, collected from BEMS. In this study, data cleaning (handling of NA’s
the main hindrance to represent occupancy at an urban scale, and missing values), data aggregation, and data separation are
especially in residential buildings, is the data non-availability. Not performed before the extraction of daily occupant patterns. Since
all the houses are expected to be installed with high-resolution (in the authors’ opinion), it is inappropriate to explain the data
motion detection sensors. In the case of office and commercial preparation methods before the dataset description used in the
buildings, the wifi signals, location-based signals, and details from study, the details of data preprocessing steps are presented in
web sources could represent occupancy. In contrast, in residential section 3.2.
buildings, the occupants are reluctant to share the data due to
privacy issues. In this regard, it is important to study the use of 2.1.2. Step 2: clustering
other variables to represent occupant related information (in the Fig. 1 shows that in Step 2, clustering is performed at two levels
absence of such data) in residential buildings. Though plug loads (Level 1 and 2). It is to be mentioned that this study focuses on
are widely used to represent the occupancy in residential buildings, extracting different occupant schedules from ’n’ number of build-
there exists a question that how well the electrical appliance usage ings rather than the individual buildings. In this regard, it is
data could replicate both the occupant activity schedule and meaningful first to discover the distinct occupant patterns within
presence state compared to other attributes such as lighting energy the clustered apartments or individual apartments (level 2 clus-
consumption (please refer to Figure S2 in the supplementary tering). By this two-level clustering procedure, extraction of the
document), CO2, relative humidity (RH) data). In this regard, sequential occupancy profile at the urban/city level can be per-
providing insights on the alternatives that shall be statistically used formed. The details of the data format used for Level 1 and Level 2
to represent the occupancy in buildings will be beneficial. clustering are presented in section 3.2.4.
3
Fig. 1. The systematic data-driven framework to explore occupant patterns in residential buildings.
To perform the clustering analysis, k-Shape clustering [36], a 2.1.3. Step 3: changepoint detection
domain-independent, accurate and scalable approach for time se- In time-series data, to understand/explore the occupant
ries clustering that outperforms partitional, hierarchical, and schedule and presence probability, it is first essential to precisely
spectral methods in terms of accuracy, is used in the study. The k- identify the time at which occupant’s level/state varies signifi-
Shape clustering method uses shape-based distance (SBD) as a cantly. In this regard, changepoint detection (CPD), a statistical
distance measure, which is invariant to shifting and scaling. The method that detects any distinct changes in the mean and variance
cross-correlational (CCw) statistical measures in Equation (1) are values of time-series data can be used. CPD calculates the optimal
used to compute each cluster’s centroid. Since k-Shape is based on positioning and number of changepoints in time series data. CPD
iterative refinement procedure, in each iteration assignment and method is used in this study for three reasons; 1) to find the change
refinement step are performed. To address the shift variance, in the point (i.e., the specific hour of the day) at which the level of oc-
assignment step, SBD, as shown in Equation (1), is used to update cupancy changes significantly, 2) to determine the average number
the memberships of the cluster by calculating time series data of occupant movements between the detected changepoints (this
centroids and by clustering each time series data with the one helps in defining the internal heat gains due to occupants in
having the closest centroid. Note that, to account for the scaling building simulation by considering the occupant’s activity level for
invariance, the sequential or time-series data is z-normalized in the a specific period instead of a specific hour), and 3) to find the
assignment step. In the refinement step, the updated members in average presence probability of occupants in a specific period. To
each cluster lead to the update of the cluster centroids. perform CPD, the ’Changepoint’ package available in R programming
is used. A detailed explanation of the change point detection with
an example and the statistical measures used are explained in
! detail in the document attached as the supplementary material.
! !
! ! CCw ð x ; y Þ
SBDð x ; y Þ ¼ 1 max qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (1)
w ! ! ! !
R0 ð x ; x Þ: R0 ð y ; y 2.2. Statistical analysis on the use of other parameters to represent
occupancy in residential buildings
Considering its competitive ability, higher efficiency, and accu-
racy compared to k-means, the k-Shape clustering is used in this The objective here is to investigate the significance of commonly
study to perform the time series clustering using the R package logged parameters such as plug power, lighting power, indoor CO2,
’dtwclust’. Similar to k-means, input related to the number of indoor RH data to represent the occupancy in residential buildings
clusters (k) must be predefined in the k-Shape clustering. Accord- and not to compare the performance of various machine learning
ingly, the Dunn index, a commonly used internal cluster validation models for occupancy prediction as carried out in Ref. [23].
index, is used. Higher Dunn index value denotes the best ’k.’ Note Accordingly, in this study, linear and logistic regression models are
that the dataset is z-normalized before clustering. The distance developed using the parameters mentioned above to predict the
measure and centroid method used is ’sbd’ and ’shape,’ respectively. occupant activity schedules and occupant presence probability. The
4
model accuracy is reported in terms of adjusted R2, mean absolute Table 1.

percentage error (MAPE) for linear regression method and
misclassification error, confusion matrix, receiver operating char- 3.2. Data preprocessing
acteristic (ROC) curve for logistic regression. For the linear regres-
sion model, the hour of the day, lighting, plug load energy It is to be mentioned that for the first objective of the study (i.e.,
consumption data, indoor CO2, and RH data are used respectively to to extract occupant activity schedule and occupant presence/
predict the occupant activity schedule in all the considered apart- absence probability), the motion detection data (’00 or ’10 ) recorded
ments. It is to be mentioned that a separate linear regression model at a 1-min time interval by the presence detector sensor was used.
was developed for each parameter, and the relationship between For the second objective (i.e., to statistically analyze the use of other
the dependent variable (occupant activity schedule) and the inde- parameters to represent the occupancy in buildings), the plug load,
pendent variable(s) were determined. For example, Model 1 (later lighting energy consumption data, indoor CO2 and RH data were
mentioned in section 4.2.1) is developed by considering the lighting used. The data cleaning and data aggregation methods adopted to
power consumption data as the independent variable. Likewise, respective parameters are explained in the following sections.
Models 2, 3, and 4 represent the model developed using plug power
data, CO2 data, and RH data, respectively, as the predictor variable. 3.2.1. Data cleaning
Similarly, Model(s) 1, 2, 3, and 4 mentioned in section 4.2.2 Since the occupant movement detection data is a binary value,
represent the logistic regression model developed using lighting, there is no outlier/anomalies detected in the entire dataset data
plug load energy consumption, CO2, and RH datasets, respectively. was intermittently filled with ’NA’. The existence of ’NAs’ is not
In the logistic regression model, for calculating the misclassification continuous, and hence they were replaced by the previous data. On
error, the definition of optimal prediction probability cutoff is the other hand, the entire day is neglected if the data are filled with
essential to convert the predicted value into a binary value. In NA’s and missing entirely for more than 1 h. Further, the unoccu-
default, the cutoff probabilistic score of 0.5 is used. However, pied days (i.e., days when the apartment is entirely unoccupied) are
adjusting the probability cutoff value improves the accuracy of the removed. After removing the unoccupied days in all the apart-
model. Hence, in this study, an R function named ’optimalCutoff" ments, it is found that among the eight apartments, one apartment
under R package ’InformationValue’ is used to determine the was occupied only for 88 days in the whole year. Hence, it is not
optimal probability cutoff value. considered for the analysis. On the other hand, to detect the out-
liers/anomalies in the plug power, lighting energy consumption,
3. Data description and processing indoor CO2, RH data, the quantile method is used.
3.1. Dataset description 3.2.2. Data aggregation

Note that this study’s interest relies on extracting the daily
The dataset used in this study belongs to eight numbers of 3- occupant patterns from a group of apartments rather than an in-
bedroom apartments of a district located in Lyon, France. The dividual apartment. In this regard, consideration of the occupant
one-year dataset collected between January and December 2016 movement data concerning each room of the apartment separately
was used for the analysis. Each apartment is installed with a home would make the analysis complex. Therefore, the occupant move-
energy management system (HEMS) that monitors and records the ments detected in all the rooms in an apartment at a given minute
indoor environment and energy use (plug load and lighting) at a 1- are aggregated, and a new attribute (total movements detected in
min resolution. Further, there is a cloud BEMS to provide integrated the entire apartment per minute) is generated. Subsequently, the 1-
management for the group of buildings as a whole through the min data is aggregated into hourly data. Further, the indoor CO2 and
secure internet network. The data regarding occupant movement RH data recorded in specific rooms at a given minute is averaged to
detection (0 or 1), indoor CO2 (ppm), indoor RH (%), temperature obtain the mean value for the entire apartment, whereas, for plug
(oC), and the number of occupants living in the apartment is power and lighting energy consumption data, the values are
available. The occupant movement detection sensors are installed aggregated.
in the living room, each bedroom, kitchen, corridors, bathroom, and
toilet, respectively, whereas CO2, temperature, and RH sensors are 3.2.3. Data preparation for clustering
installed only in the living room and bedrooms. The plug load and The objective of Level 1 clustering is to find the apartments with
lighting energy consumption data (Wh) are recorded to represent similar occupant schedules. In this regard, for each apartment, the
the building energy use. Unlike the occupant movement detection time-series dataset that represents the average of occupant
sensor, the zones/rooms and connected appliance details for plug movements detected at each hour for the whole year is considered.
power consumption data are unknown. The details of the floor plan Accordingly, a data frame containing seven observations with 24
and the coverage area of the motion sensors installed in the variables is generated, as shown in Fig. 2 (left). Besides, the objec-
apartment are shown in Figure S1 of the supplementary document. tive of Level 2 clustering is to extract distinct occupant schedules in
The details of sensors used to detect occupant movement and po- the considered buildings. Therefore, based on the results of Level 1
wer consumption (both plug power and lighting) are given in clustering, the data corresponding to the apartments with similar
Table 1
Specification of sensors installed in the considered apartments.
Sensor Manufacturer details Type Accuracy Detection area/range Measurement resolution
Presence detector Theben PlanoCentro A-KNX e 64 m2 if seated Event-baseda

Plug load and lighting ABB KNX Energy Module: EM/S 3.16.1 ±2e6% e 1 min
Indoor CO2 and RH Theben AMUN 716 e CO2: 0e9999 ppm 1 min
RH: 1e100%
a
Note that event-based sensors can be triggered at any time. The occupant movement detected in the actual registry time was transformed into the structured and
organized data at 1-min resolution. In specific, if one or more movements are detected within 1 min, it is recognized as 1.
5
Fig. 2. Data considered for Level 1 clustering (left) and example of data considered for Level 2 clustering (right).
occupant profiles are grouped together, and their daily data in occupant patterns and presence status vary significantly during the
terms of the total number of occupant movement detected for each weekends compared to weekdays. This is one of the critical inputs
hour (shown as an example in Fig. 2 (right)) is considered to that should be considered while assessing energy use performance
perform Level 2 clustering. and electric load, space conditioning demand forecasting in resi-
dential buildings. Thus, to explore the variation in occupant activity
and presence probability between weekdays and weekends, the
3.2.4. Data separation
dataset (after Level 1 clustering) is separated into weekdays,
Unlike office buildings, residential buildings are occupied every
weekends. Subsequently, Level 2 clustering is performed.
day of the week. Hence, it is essential to study whether the
Fig. 3. Level 1 clustering results.
6
Table 2
Clusters description.
Types of occupant pattern Cluster # Apartment Characteristics
Type A 5 3 and 7 Dual (morning and evening) peak pattern 1a

Type A-l 3 4 Dual peak pattern 2
Type B 4 2 Morning peak
Type C 1 1 and 6 All-time active pattern 1
Type C-l 2 5 All-time active pattern 2
a
Dual peak denotes the high occupant movements in the morning and evening in the respective apartment(s). Morning peak represents high occupant movements
detected only in the morning.
Fig. 4. Level 2 clustering results of weekdays for Type A occupant pattern (The x-axis of the heat map is ’Hours’ and the contour represent the total number of occupant’s movement
detected for the given day and hour).
4. Results and discussion initially, the level 1 clustering results are explained, followed by
level 2 clustering results.
4.1. Extraction of occupant activity schedule and presence
probability 4.1.1. Level 1 clustering
Fig. 3 shows the results of Level 1 clustering, where the apart-
In this section, the distinct occupant patterns extracted using ments with similar occupant patterns are grouped in the same
the developed data-driven framework is presented. Note that, the cluster. As shown in the figure, the apartments are grouped as 5
occupant activity schedule could be used to estimate the internal clusters based on the results of the internal cluster validation index
heat gains and presence probabilty shall be used to determine the (i.e., Dunn index). Each cluster’s characteristics can be described by
number of occupants present in the building. In this section, its centroid, which represents the standard score of the dataset that
belonged to the respective cluster. As seen in the figure, the
apartments with similar occupant schedules are grouped in clus-
Table 3 ters 1 and 5, and apartments with distinct occupant schedules are
Distribution of days in each cluster (apartments with Type A occupancy pattern).
represented as separate clusters. Since the dataset is z-normalized
Clusters Spring Summer Autumn Winter and the occupant movements in the entire apartment are consid-
Cluster 1 15 %a 10% 41% 34% ered (instead of individual rooms), ’n’ number of buildings can be
Cluster 2 31% 31% 17% 21% considered for the analysis. Based on the Level 1 clustering results,
Cluster 3 23% 19% 34% 24% each cluster is characterized as shown in Table 2. Since apartments
a
The percentage in the table represent the distribution of days in each cluster grouped in cluster 1 have similar occupant schedules (refer Fig. 3),
concerning different seasons. the attribute ’number of occupant movement detected’ belonged to
7
Fig. 5. Number of occupant movements detected for each hour of the day in each cluster.
Fig. 6. Hourly occupant activity schedule for each day for Type A occupancy pattern.
apartments 1 and 6 are considered as one dataset and are grouped section 3.2.3, for Level 2 clustering, the daily time-series occupant
as weekdays and weekends, respectively. Accordingly, Level 2 movement data in apartment 3 and 7 were merged. In specific, 356
clustering is performed on the combined dataset. Similarly, the weekdays (i.e. 172 days (for apartment 3) and 184 days (for apart-
dataset of apartments 3 and 7 are grouped to perform the level 2 ment 7)), 137 weekend days (i.e. 50 days (for apartment 3) and 87
clustering. For apartments 2, 4, and 5, the dataset is separated as days (for apartment 7)) are considered, respectively. Note that Level
weekdays, weekends, and subsequently, clustering is performed for 2 clustering is performed separately for weekdays and weekends.
each respective apartment to explore the distinct occupant Based on the results of the Dunn index, the weekday’s dataset is
schedules. grouped into 3 clusters. Among the considered weekdays, 29%, 39%,
For the illustration of results and considering a similar proced- and 32% of the days belonged to cluster 1, 2, and 3, respectively.
ure followed for the extraction of occupant schedule, presence Fig. 4 shows the standard (z) score of each cluster’s centroid and the
probability in Type B, Type C, and C-1 occupant patterns, the Level 2 heat map of temporal occupant movement recorded for each day in
clustering results of Type A and Type A-1 alone are explained in the the respective cluster. As seen from the figure, the apartments
paper. having Type-A occupant pattern (refer Table 2) possess three
distinct patterns; 1) cluster 1, where the occupant movements are
4.1.2. Level 2 e clustering high throughout the day, 2) cluster 2, with high occupant move-
ments observed in the morning, and 3) cluster 3, where the days
4.1.2.1. Type A occupant patterns during weekdays. As mentioned in
8
Fig. 7. Clusterwise presence probability of occupants for each hour of the day (Type A occupant pattern).
with inherent clustering characteristics (i.e., high occupant move- over/underestimation of energy-saving potential. For the explicit
ment in morning and evening). The inference from Fig. 4 is that in representation of the results, the average of hourly occupant
apartments that possess a dual peak occupant schedule, some days movements for each day of the week is presented in Fig. 5. As seen
(cluster 1) possessed different occupant schedules/activity levels, from the figure, the occupant movement in cluster 1 is relatively
and this diversity should be considered while energy prediction high for the entire day compared to clusters 2 and 3.
and energy use assessment. The data distribution (in terms of the Though the patterns of daily occupant movements can be
number of days) of clusters 1, 2, and 3 concerning seasons are observed from Fig. 5, information on the specific time when the
shown in Table 3. Note that, similar to season-wise distribution occupant schedule varies significantly is unclear. For instance, it is
analysis, the data is also analyzed in the day of the week aspect (i.e., clear from Fig. 5 that around 06:00 h, the occupant activity changes
the number of days belongs to Monday to Friday, respectively). considerably in all the clusters, but after that, such time specifica-
However, no unique variation (as observed in season-wise distri- tion seems complicated. In this regard, the CPD method is used to
bution) is found in this regard, and hence the specific results are not find the time at which the occupant activity level varies signifi-
presented. cantly for each cluster. Fig. 6 shows the results of the CPD method,
It is inferred from Table 3 that in cluster 1, where the occupant where the occupant schedule presented in Fig. 5 is transformed
movements are relatively higher throughout the day, 75% of the using min-max normalization, and the activity level for each hour is
days belonged to autumn and winter. On the other hand, in cluster specified. The information shown in Fig. 6 can be used as the sig-
2, the majority (62%) of the days belonged to spring and summer, nificant information to define the periods (during when high (low)
whereas in cluster 3, no such significant distinct variation is occupant activity is expected), and accordingly, the internal heat
observed. This gives the information that most of the days in gains due to occupant activity schedule in residential buildings can
autumn and winter, the occupant (especially in apartment 3) re- be interpreted.
mains home. In summer and spring, the occupant’s follow a In Fig. 6, the extracted information related to occupant activity
working schedule wherein the afternoon low occupant movements in each cluster is presented. Such information could be beneficial
are detected for all the weekdays. In this case, a separate heating/ while estimating the heating/cooling demands and internal heat
cooling schedule, energy-saving advisories must be followed for gains due to occupants. Besides, information related to occupant’s
the days in clusters 1, 2, and 3, respectively. The consideration of the presence probability is crucial in building energy simulations to
default occupant schedule for assessment of building energy use know the actual number of occupants in the building. Accordingly,
will lead to significant errors. Besides, comparing the building en- the daily occupant presence probability is derived for each cluster
ergy use between different days within the same apartment and is presented in Fig. 7. The figure shows the occupant’s presence
without considering the explored occupancy schedules will lead to probability on each day of the week for three clusters, respectively.
9
Fig. 8. Level 2 clustering results of Type A occupant pattern during weekends.
Fig. 9. Level 2 clustering results of Type A-1 occupant pattern (apartment 4).
As depicted in Fig. 7, the probability of the occupant is present in 137 days were considered for weekends and clustered into two
the building is high throughout the day for cluster 1. In contrast, clusters based on the Dunn index results. As seen from the figure,
clusters 2 and 3 follow the dual peak pattern, where the presence days with higher occupant movements for the entire day are
probability is higher during the morning and evening hours. grouped in cluster 1, and the days with dual peak patterns are
grouped in cluster 2. The inference from the clustering result is that
74% of the days belonged to cluster 2, which indicates that the
4.1.2.2. Type A occupant patterns during weekends. Fig. 8 (a) shows occupant in apartments 3 and 7 follow a similar schedule (Type A
the clustering results of Type A pattern for the weekends. Totally,
10
Table 4 being present at home between 11:00 he18:00 h is only 20%. In this
Distribution of days in each cluster (apartments with Type A-1 occupant pattern). case, it can be considered as the occupant absence. This can be
Weekdays Cluster 1 Cluster 2 useful information to forecast the energy demand in the building
Monday 0%a 44%
and assess the occupants’ energy use behavior.
Tuesday 30% 10%
Wednesday 30% 7% 4.1.2.3. Type A-1 occupant patterns during weekdays. The Level 2
Thursday 28% 13%
clustering results of apartment 4 and the distribution of days in
Friday 12% 26%
each cluster are shown in Fig. 9 and Table 4, respectively. Based on
a
The numbers inside the table represent the distribution of days in each cluster the Dunn index results, the weekdays in apartment 4 are grouped
with respect to weekdays.
into two clusters, and out of 214 weekdays, 115 days were grouped
in cluster 1. As depicted in Fig. 9, days with consistent dual peak
occupant pattern) during both weekdays and weekends. The other occupant schedule were grouped in clusters 1, and days with
inference is, in cluster 2, during Sunday, the number of occupant irregular occupancy were grouped in cluster 2. Interestingly, not
movements in the afternoon is minimum (similar to nighttime), even a single Monday is grouped in cluster 1, which indicates the
which indicates the occupant absence at home. This can be different occupant activity schedule during Mondays compared to
affirmed by referring to Fig. 8 (b) and Fig. 8 (c), where the former other weekdays. On the other hand, Fig. 9 (right) shows that the
figure depicts the normalized occupant movement/activity level, occupant activity during Fridays in cluster 2 is almost similar to
and the latter depicts the occupant’s presence probability for the cluster 1, except for the fact that the number of occupant move-
days in each cluster during the weekends. Fig. 8 (c) clearly indicates ments during the evening in cluster 2 is relatively less. Further-
that for the days belonged to cluster 2, the probability of occupant more, Table 4 indicates that in cluster 2, Monday (44%) and Friday
(26%) dominated the cluster composition, which means the
Fig. 10. Hourly occupant activity level for each day for the Type A-1 occupant pattern.
Fig. 11. Clusterwise presence probability of occupants for each hour of the day (Type A-1 occupant pattern).
11
Fig. 12. Level 2 clustering results of Type A-1 occupant pattern during weekends.
Table 5
Results of linear regression models without considering the hour of the day as a predictor variable.
Apartment Model 1 Model 2 Model 3 Model 4

2 2 2
Adjusted R MAPE Adjusted R MAPE Adjusted R MAPE Adjusted R2 MAPE
1 0.51 0.51 0.57 0.44 0.41 0.56 0.08 0.61

2 0.57 0.45 0.61 0.35 0.08 0.55 0.00 0.54
3 0.71 0.35 0.71 0.30 0.16 0.50 0.02 0.50
4 0.60 0.43 0.58 0.41 0.01 0.65 0.08 0.62
5 0.47 0.45 0.58 0.41 0.33 0.49 0.13 0.51
6 0.51 0.44 0.71 0.32 0.37 0.77 0.07 0.51
7 0.84 0.23 0.68 0.37 0.00 0.61 0.04 0.60
occupant schedule in the rest of the days in cluster 2 is irregular or figure’s inference is that unlike apartments 3 and 7, the occupant
happened occasionally. does not follow a similar/repetitive occupant schedule as weekdays
After performing Level 2 clustering, the normalized occupant in apartment 4 (i.e., Type A-1). Note that there were 105 weekend
activity level and the daily weekday presence probability are ob- days in total, and after data preprocessing, 97 days were considered
tained using the CPD method, and the results are plotted as Fig. 10 initially. Moreover, as mentioned in section 3.2.1, completely un-
and Fig. 11, respectively. Note that similar to apartments 3 and 7, occupied days were omitted for the Level 2 clustering analysis.
variations in the distribution of days with respect to seasons were Thus, out of 97 weekend days, only 59 days were considered for the
analyzed for apartment 4. The results showed no distinctive dif- analysis. The clustering analysis’s main inference is that out of 59
ference between the distribution of days in terms of season. The occupied weekend days, 30 days belonged to cluster 3 (12 days) and
understanding from the k-Shape clustering is that consistent cluster 4 (18 days). Further analyzing the distribution of days based
occupant schedule is followed in apartment 4 (similar pattern from on the season, most of the days in cluster 3 (75% of the days) and
Tuesday to Friday and a unique pattern on Monday’s) and these cluster 4 (78% of the days) belonged to autumn and winter. Besides,
findings shall be an essential input while building energy use it is understood that 29 weekend days (out of 97 days) in summer
assessment, energy use simulation and for programming the and spring were completely unoccupied. This shows that on the
thermostats and other electrical appliances. weekends during summer and spring apartment 4 was unoccupied
for the majority of the time, and during autumn, winter, the
occupant follows the patterns of clusters 3 and 4. Considering the
4.1.2.4. Type A-1 occupant patterns during weekends. The Dunn lesser number of weekend days in each cluster, similar procedure,
index showed the optimal number of clusters as 7 for the Type A-1 the results of normalized occupant activity schedule, and presence
occupant pattern for the weekend days. Accordingly, the level 2 probability for apartment 4 are not presented in the paper.
clustering is performed, and the result is depicted in Fig. 12. The
12
0 20 40 60 80 500 540 580
600
400
Plug power (Wh)
200
80
Occupant movement
40
0
60
Lighting energy (Wh)
0 20
560
CO2 (PPM)
500
51
RH (%)
49
47
200 400 600 0 20 40 60 47 48 49 50 51
Fig. 13. Pair plot showing the correlation between the each considered parameters.
Table 6
Results of linear regression models with the hour of the day as one of the predictor variables.
Apartment Model 1 Model 2 Model 3 Model 4 Model 5
Adjusted R2 MAPE Adjusted R2 MAPE Adjusted R2 MAPE Adjusted R2 MAPE Adjusted R2 MAPE
1 0.94 0.12 0.94 0.13 0.94 0.46 0.93 0.13 0.95 0.26
2 0.92 0.21 0.92 0.22 0.91 0.23 0.90 0.23 0.95 0.21
3 0.72 0.33 0.71 0.30 0.23 0.48 0.25 0.50 0.82 0.25
4 0.66 0.36 0.58 0.42 0.28 0.56 0.30 0.56 0.82 0.27
5 0.88 0.17 0.85 0.19 0.84 0.25 0.80 0.25 0.93 0.15
6 0.87 0.16 0.87 0.17 0.83 0.23 0.83 0.21 0.90 0.18
7 0.90 0.18 0.87 0.23 0.89 0.50 0.85 0.24 0.93 0.36
Table 7
Prediction results of occupant presence/absence state.
Apartment Model 1 Model 2 Model 3 Model 4
MC error AU ROC MC error AU ROC MC error AU ROC MC error AU ROC
1 0.29 0.81 0.13 0.88 0.10 0.90 0.18 0.64

2 0.53 0.67 0.21 0.76 0.20 0.62 0.28 0.49
3 0.28 0.79 0.17 0.90 0.17 0.84 0.32 0.52
4 0.25 0.81 0.28 0.76 0.23 0.80 0.28 0.48
5 0.36 0.78 0.10 0.92 0.10 0.88 0.12 0.57
6 0.40 0.75 0.14 0.87 0.17 0.69 0.35 0.59
7 0.22 0.82 0.32 0.71 0.30 0.64 0.45 0.49
4.2. Prediction results of the statistical models (regression analysis) occupant activity level good enough compared to CO2 (Model 3)
and RH data (Model 4). This can be explained by referring to Fig. 13,
4.2.1. Prediction of occupant activity level which is a pair plot that depicts the correlation between each
Table 5 shows the occupant activity schedule’s prediction results parameter. For the illustration purpose, Fig. 13 is plotted by
in terms of adjusted R2 and MAPE for each model. It can be seen considering the parameters recorded in Apartment 1. The figure
from the adjusted R2, and MAPE (%) values that lighting (Model 1), clearly shows the linear relation exists between the lighting, plug
plug load energy consumption data (Model 2) predicted the load energy consumption and the number of occupant movements.
13
Fig. 14. Confusion matrix.
As shown in the red box, the plug load and lighting energy con- MAPE. Thus, it is construed from the results that in the absence of a
sumption increased as the occupant movement increased. How- heterogeneous dataset, lighting energy consumption along with
ever, such a linear relation is not found between CO2, RH, and the hour of the day variable can be used to predict the occupant
number of occupant movement data. activity level in residential buildings.
Though there is a linear relation between the energy con-
sumption and occupant movement data, the values of adjusted R2
and MAPE for models 1 and 2 do not seem good enough to repre-
sent the occupant activity level in the considered apartments. 4.2.2. Prediction of occupant presence/absence state
Hence, along with the predictor variable(s), the day’s hour is The logistic regression model prediction results in terms of
included in the respective models to check whether there is an misclassification (MC) error and AUROC using different parame-
improvement in adjusted R2 and MAPE results, and the obtained ter(s) are presented in Table 7. The inference from the table is that
results are presented in Table 6. As seen in Table 6, the prediction in all the apartments, Model 2 (plug load energy consumption data)
results increased for all the predictor variables in terms of adjusted and Model 3 (CO2 data) predicted the occupant presence status well
R2 and MAPE values. The advantage of considering the hour of the compared to Model 1 (lighting energy data) and Model 4 (RH data).
day along with the logged parameter(s) while predicting the hourly This is evident from the lesser misclassification error and higher
average occupant activity level on a daily basis is further explained AUROC values of Model 2 and 3. This can be further explained by
with an example (Figure S6) of the supplementary document. the fact that if the occupants are present at home, they tend to use
Overall, the lighting energy consumption data possessed better electrical appliances. This increases the total energy consumption,
prediction results, followed by the plug load energy consumption and this value deviates from the base energy consumption (i.e.,
dataset. An interesting observation is made from Tables 5 and 6 For energy consumption value recorded when there is no occupant at
apartments 3 and 4, the adjusted R2 and MAPE values are relatively home). As an example, the confusion matrix and ROC curve ob-
less even after the inclusion of the hourly variable with the tained for each model concerning apartment 1 are shown in Figs. 14
particular predictor variable. The possible reason for this could be a and 15, respectively. The inference from Fig. 14 is that compared to
very consistent occupant schedule followed in these apartments all the other models, Model 3 predicted both the occupant presence
compared to others. Because of the routine occupant activity and absence status good enough, which is evident from the true
schedule (as shown in Figs. 10 and 11), the impact of the hourly positive and true negative rates. Fig. 15 shows that Model 3 pos-
variable gets neglected, and the variation between the prediction sesses a much steeper and higher AUROC value compared to other
results with and without the consideration of hour of the day is less. models. Though Model 4 had a low misclassification error (0.18),
In Table 6, model 5 is developed by considering all the parameters the value of AUROC is relatively lesser (0.64) compared to other
(hour, lighting, plug load energy consumption data, CO2, RH data) models. This means the ability of the model to predict the occupant
together as the predictor variables. When comparing the prediction presence/absence state is low. The inference from Table 7 is that the
results of Model 1 and Model 5, the variation between the adjusted above-said explanation applies to all the other apartments. Thus,
R2 values is lesser or even negligible except the apartments 3 and 4. consideration of the RH data set for occupant present/absent state
Compared to model 5, model 1 possessed better results in terms of prediction is not recommended.
14
Fig. 15. ROC plot and AUROC values for different logistic regression models.
5. Main outcomes, findings and clarifications on the Besides, the main hindrance for occupant related data avail-
contribution of the study ability, especially in residential buildings, is the need for costly
motion detection sensors. The installation of motion sensors with a
This study aims to address two research questions in analyzing wide area coverage range in all the buildings is impractical because
occupancy in residential buildings; (1) upon the availability of of their higher cost, and in general, the occupant is reluctant to
occupant data, how to systematically extract the temporal based install them because of privacy concerns. On the other hand, the
occupant patterns, (2) when the occupant related data is not energy consumption data in almost all buildings are monitored by
available, what are the other alternative parameters that could be energy suppliers, and most of the newly built/retrofitted buildings
possibly used to represent occupancy in residential buildings. The are equipped with indoor CO2 and RH sensors to maintain a good
outcomes of the developed framework are hourly occupant activity indoor environment. This makes easier access to energy and indoor
schedule and occupant presence probability, respectively. The re- related data easier compared to the real-time occupancy data.
sults presented may apply to the apartments considered in this Considering the wide availability, the parameters such as energy
study. It is to be mentioned that the interest of the study is to use data, indoor CO2, and RH data could represent occupancy in
exemplify the variations associated with the occupancy in resi- buildings. However, the occupancy representativeness of these
dential buildings and to emphasize why the default occupant parameters is unclear. Accordingly, regression analysis is per-
schedules should not be considered in building energy simulations. formed, and the insights on the statistical significance of such pa-
However, the main question is the generalizability of the developed rameters are presented. Along with the above-mentioned datasets,
data-driven framework. As mentioned in section 3.2.2, the study is recently, new data sources such as WIFI signals, GPS/mobile loca-
focused on extracting the occupant schedules from the group of tion data, and location-based data are considered as occupancy
buildings at an urban scale rather than the individual buildings. information. However, most of the studies that considered the new
Upon the availability of data at the urban scale, the proposed occupant’s related data source was related to restaurants and retail
methodology can be used on ’n’ number of buildings, and the shops. The use of these data to generate the occupant schedules in
distinctive occupant schedules with temporal variations can be residential buildings are yet to be explored.
extracted. Especially when the data is available on the district scale,
it is not recommended to analyze each building separately, and in
6. Conclusion
such a scenario, the proposed framework can be used. The pro-
cedure remains the same irrespective of the number of buildings.
In this study, a generic data-driven framework is proposed to
15
explore the distinct occupant schedule and the presence proba- References
bility from a group of residential buildings. For the analysis, a high-
resolution dataset collected from a group of apartments that [1] Martinaitis V, Zavadskas EK, Motuziene_ V, Vilutiene_ T. Importance of occu-
pancy information when simulating energy demand of energy efficient house:
belonged to a small district located in Lyon, France, was used. The k- a case study. Energy Build 2015;101:64e75.
Shape clustering and change point detection method are used to [2] Yoshino H, Hong T, Nord N. IEA EBC annex 53: total energy use in buil-
derive the occupancy related information from the raw data dingsdanalysis and evaluation methods. Energy Build 2017;152:124e36.
[3] Yan D, Hong T, Dong B, Mahdavi A, D’Oca S, Gaetani I, et al. IEA EBC Annex 66:
collected from BEMS. The clustering was done at two levels, where definition and simulation of occupant behavior in buildings. Energy Build
the level 1 clustering was performed to identify and group the 2017;156:258e70.
apartments with similar occupant schedules over the year. The [4] Buttitta G, Turner WJ, Neu O, Finn DP. Development of occupancy-integrated
archetypes: use of data mining clustering techniques to embed occupant
level 2 clustering was performed to explore the distinct occupant behaviour profiles in archetypes. Energy Build 2019;198:84e99.
patterns with the clusters. In this way, occupant activity schedules [5] Rouleau J, Gosselin L, Blanchet P. Robustness of energy consumption and
and presence probability of ’n’ number of buildings at the city level comfort in high-performance residential building with respect to occupant
behavior. Energy 2019;188:115978.
can be explored on an hourly basis. The inference from the
[6] Li J, Yu ZJ, Haghighat F, Zhang G. Development and improvement of occupant
extracted patterns is that the occupant schedule in a building might behavior models towards realistic building performance simulation: a review.
vary daily and seasonal, and these variations should be considered Sustainable Cities and Society; 2019. p. 101685.
[7] Happle G, Fonseca JA, Schlueter A. Context-specific urban occupancy modeling
in energy simulations. Ignoring the variations associated with the
using location-based services data. Building and Environment; 2020.
occupant schedules and consideration of the default occupant p. 106803.
schedule irrespective of the day of the week and different seasons [8] Huchuk B, O’Brien W, Sanner S. A longitudinal study of thermostat behaviors
while evaluating the building energy use performance and energy based on climate, seasonal, and energy price considerations using connected
thermostat data. Build Environ 2018;139:199e210.
simulations will lead to uncertainty in the building performance [9] Pisello AL, Asdrubali F. Human-based energy retrofits in residential buildings:
simulation. a cost-effective alternative to traditional physical strategies. Appl Energy
Also, the significance of other parameters such as lighting, plug 2014;133:224e35.
[10] Nilsson A, Wester M, Lazarevic D, Brandt N. Smart homes, home energy
load energy consumption data, CO2, RH data on representing the management systems and real-time feedback: lessons for influencing
occupant schedule, and presence/absence state were analyzed us- household energy consumption from a Swedish field study. Energy Build
ing regression models. The regression analysis indicates that the 2018;179:15e25.
[11] Ehrhardt-Martinez K, Donnelly KA, Laitner S. Advanced metering initiatives
inclusion of the hour of the day as one of the predictor variables and residential feedback programs: a meta-review for household electricity-
along with the other parameters appreciably increased the pre- saving opportunities. Conference Advanced metering initiatives and resi-
diction results in terms of both adjusted R2 and MAPE values. dential feedback programs: a meta-review for household electricity-saving
opportunities. American Council for an Energy-Efficient Economy Washing-
However, for the apartments with a very consistent occupant
ton, DC.
schedule (for most of the days in a week), the impact of considering [12] Li J, Panchabikesan K, Yu Z, Haghighat F, Mankibi ME, Corgier D. Systematic
the hour of the day as the predictor variable gets diminished. For data mining-based framework to discover potential energy waste patterns in
residential buildings. Energy Build 2019;199:562e78.
the prediction of occupant activity level, the lighting energy con-
[13] Ashouri M, Haghighat F, Fung BCM, Yoshino H. Development of a ranking
sumption data followed by plug load energy consumption data procedure for energy performance evaluation of buildings based on occupant
showed better results. On the other hand, CO2 and plug load energy behavior. Energy Build 2019;183:659e71.
consumption data showed better results in misclassification error [14] Ashouri M, Haghighat F, Fung BCM, Lazrak A, Yoshino H. Development of
building energy saving advisory: a data mining approach. Energy Build
and AUROC while predicting the occupant presence/absence state 2018;172:139e51.
and are recommended over the lighting energy consumption, RH [15] Chen S, Yang W, Yoshino H, Levine MD, Newhouse K, Hinge A. Definition of
data. occupant behavior in residential buildings and its application to behavior
analysis in case studies. Energy Build 2015;104:1e13.
[16] Buttitta G, Finn DP. A high-temporal resolution residential building occupancy
Credit author statement model to generate high-temporal resolution heating load profiles of
occupancy-integrated archetypes. Energy Build 2019:109577.
[17] Causone F, Carlucci S, Ferrando M, Marchenko A, Erba S. A data-driven pro-
Karthik Panchabikesan: Conceptualization, Formal analysis, cedure to model occupancy and occupant-related electric load profiles in
Formal analysis, Writing e original draft, Reviewing and Validation. residential buildings for energy simulation. Energy Build 2019;202:109342.
[18] D’Oca S, Hong T. Occupancy schedules learning process through a data mining
Fariborz Haghighat: Conceptualization, Supervision, Formal anal- framework. Energy Build 2015;88:395e408.
ysis, Writing e review & editing, Validation, Funding acquisition. [19] Mitra D, Steinmetz N, Chu Y, Cetin KS. Typical occupancy profiles and be-
Mohamed El Mankibi: Providing the data, Writing e review & haviors in residential buildings in the United States. Energy Build 2020;210:
109713.
editing.
[20] Razavi R, Gharipour A, Fleury M, Akpan IJ. Occupancy detection of residential
buildings using smart meter data: a large-scale study. Energy Build 2019;183:
Declaration of competing interest 195e208.
[21] Zhou Y, Chen J, Yu ZJ, Li J, Huang G, Haghighat F, et al. A novel model based on
multi-grained cascade forests with wavelet denoising for indoor occupancy
The authors declare that they have no known competing estimation. Build Environ 2020;167:106461.
financial interests or personal relationships that could have [22] Martin E, Vinyals O, Friedland G, Bajcsy R. Precise indoor localization using
smart phones. Conference Precise indoor localization using smart phones. p.
appeared to influence the work reported in this paper. 787-790.
[23] Huchuk B, Sanner S, O’Brien W. Comparison of machine learning models for
occupancy prediction in residential buildings using connected thermostat
Acknowledgment data. Build Environ 2019;160:106177.
[24] Nord N, Tereshchenko T, Qvistgaard LH, Tryggestad IS. Influence of occupant
The authors express their gratitude to Concordia University for behavior and operation on performance of a residential Zero Emission
Building in Norway. Energy Build 2018;159:75e88.
the support provided through the HORIZON postdoc fellowship [25] Yu Z, Fung BCM, Haghighat F, Yoshino H, Morofsky E. A systematic procedure
program. to study the influence of occupant behavior on building energy consumption.
Energy Build 2011;43(6):1409e17.
[26] Yu Z, Hu B, Sun Y, Li A, Li J, Zhang G. Standby energy use and saving potentials
Appendix A. Supplementary data associated with occupant behavior of Chinese rural homes. Energy Build
2017;154:295e304.
[27] Anand P, Cheong D, Sekhar C, Santamouris M, Kondepudi S. Energy saving
Supplementary data to this article can be found online at
estimation for plug and lighting load using occupancy analysis. Renew Energy
https://doi.org/10.1016/j.energy.2020.119539.
16
2019;143:1143e61. power consumption data mining. Energy Build 2014;82:341e55.

[28] Liang X, Hong T, Shen GQ. Occupancy data analytics and prediction: a case [32] Yang J, Ning C, Deb C, Zhang F, Cheong D, Lee SE, et al. k-Shape clustering
study. Build Environ 2016;102:179e92. algorithm for building energy usage patterns analysis and forecasting model
[29] Yu Z, Haghighat F, Fung BCM, Yoshino H. A decision tree method for building accuracy improvement. Energy Build 2017;146:27e37.
energy demand modeling. Energy Build 2010;42(10):1637e46. [33] Zhang Y, Bai X, Mills FP, Pezzey JCV. Rethinking the role of occupant behavior
[30] Yu Z, Haghighat F, Fung BCM, Morofsky E, Yoshino H. A methodology for in building energy performance: a review. Energy Build 2018;172:279e94.
identifying and improving occupant behavior in residential buildings. Energy [36] Paparrizos J, Gravano L. k-shape: efficient and accurate clustering of time
2011;36(11):6596e608. series. Conference k-shape: efficient and accurate clustering of time series.
[31] Zhao J, Lasternas B, Lam KP, Yun R, Loftness V. Occupant behavior and ACM, p. 1855-1870.
schedule modeling for building energy simulation through office appliance
17

Data Driven Occupancy Information For Energy Simulation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Driven Occupancy Information For Energy Simulation

Uploaded by

Copyright:

Available Formats

Energy 218 (2021) 119539

Contents lists available at ScienceDirect

Data driven occupancy information for energy simulation and energy

model accuracy is reported in terms of adjusted R2, mean absolute Table 1.

3.1. Dataset description 3.2.2. Data aggregation

Sensor Manufacturer details Type Accuracy Detection area/range Measurement resolution

Presence detector Theben PlanoCentro A-KNX e 64 m2 if seated Event-baseda

Fig. 3. Level 1 clustering results.

Types of occupant pattern Cluster # Apartment Characteristics

Type A 5 3 and 7 Dual (morning and evening) peak pattern 1a

Fig. 8. Level 2 clustering results of Type A occupant pattern during weekends.

Apartment Model 1 Model 2 Model 3 Model 4

1 0.51 0.51 0.57 0.44 0.41 0.56 0.08 0.61

0 20 40 60 80 500 540 580

Apartment Model 1 Model 2 Model 3 Model 4 Model 5

Apartment Model 1 Model 2 Model 3 Model 4

MC error AU ROC MC error AU ROC MC error AU ROC MC error AU ROC

1 0.29 0.81 0.13 0.88 0.10 0.90 0.18 0.64

Fig. 14. Confusion matrix.

2019;143:1143e61. power consumption data mining. Energy Build 2014;82:341e55.

You might also like