Electrical Power and Energy Systems: Wei Zhang, Fang Liu, Rui Fan

Electrical Power and Energy Systems 103 (2018) 634–643
Contents lists available at ScienceDirect
Electrical Power and Energy Systems

journal homepage: www.elsevier.com/locate/ijepes
Improved thermal comfort modeling for smart buildings: A data analytics T

study
⁎
Wei Zhanga, Fang Liua, , Rui Fanb
a
School of Computer Science and Engineering, Nanyang Technological University, Singapore
b
School of Information Science and Technology, ShanghaiTech University, Shanghai, China
A R T I C LE I N FO A B S T R A C T
Keywords: Thermal comfort is a key consideration in the design and modeling of buildings and is one of the main steps to
Thermal comfort achieving smart building control and operation. Existing solutions model thermal comfort based on factors such
Machine learning as indoor temperature. However, these factors are not directly controllable by building operations, and instead
Data analytics are a by-product of complex interactions between controllable parameters such as air conditioning setpoint and
Smart buildings
other environmental conditions. In this paper, we use machine learning (ML) to bridge the gap between con-
Smart city
trollable building parameters and thermal comfort, by conducting an extensive study on the efficacy of different
Smart grid
ML techniques for modeling comfort levels. We show that neural networks are especially effective, and achieve
98.7% accuracy on average. We also show these networks can lead to linear models where thermal comfort score
scales linearly with the HVAC setpoint, and that the linear models can be used to quickly and accurately find the
optimal setpoint for the desired comfort level.
1. Introduction building, the building management system (BMS) predicts thermal

comfort levels based on different controllable building settings, before
Thermal comfort modeling is a crucial component in the process of deploying specific settings. However, while the PMV model establishes
building design, operation and optimization. Energy consumption by a comfort score based on the PMV factors, it does not capture the re-
buildings accounts for 40% of energy and 60% of electricity usage lationship between controllable building settings and the comfort score.
worldwide. On average, over 50% of a building’s energy is used by the In particular, the three indoor PMV factors, indoor temperature, indoor
heating, ventilation and air conditioning (HVAC) system [1,2], while in humidity and MRT, cannot be directly controlled by the BMS, and in-
areas such as Australia and the Middle East the figure can be as high as stead result from complex interactions between controllable building
70% [3]. The primary product of an HVAC system is the thermal parameters such as HVAC settings, weather conditions and other fac-
comfort. People today spend over 90% of their time in buildings [4], tors. Thus, to improve thermal comfort for smart buildings, controllable
and poor comfort in buildings increases the chances of sick building building factors should be used for modeling in place of non-con-
syndrome, absenteeism and cognitive degradation [5]. Thus, it is im- trollable ones.
portant to create a healthy and comfortable indoor space, while at the In this paper, we aim to use machine learning (ML) [8–10] to cap-
same time minimizing building energy use. A key step towards this goal ture the impact of controllable HVAC operations on thermal comfort.
is creating accurate models of thermal comfort. The basic idea is first to use different ML algorithms to model the re-
In the past decades, thermal comfort modeling has received much lationship between controllable parameters and the indoor PMV fac-
research attention. The most popular model is the predicted mean vote tors, given the current date, time and weather information. Then, the
(PMV) model proposed by Fanger et al. [6] and adopted in the ASHRAE PMV factors predicted by the ML models for different HVAC settings are
Standard 55 [7]. PMV models thermal comfort using six factors, in- fed into the BMS for decision making. To validate the soundness of this
cluding four environmental factors (indoor temperature, indoor hu- approach, we performed extensive data analytics using a variety of ML
midity, mean radiant temperature (MRT) and air velocity) and two vital models. The key results we obtained are as follows:
ones (metabolic rate and clothing insulation).
Although the PMV model works fine for evaluating thermal comfort, 1. Nonlinear ML algorithms, including support vector machine re-
it is not readily applicable to smart buildings. In a typical smart gression (SVR) with nonlinear RBF kernels and neural networks
⁎
Corresponding author.
E-mail addresses: wei.zhang@ieee.org (W. Zhang), fliu@ieee.org (F. Liu), fanrui@shanghaitech.edu.cn (R. Fan).
https://doi.org/10.1016/j.ijepes.2018.06.026
Received 27 February 2018; Received in revised form 27 April 2018; Accepted 7 June 2018
0142-0615/ © 2018 Elsevier Ltd. All rights reserved.
W. Zhang et al. Electrical Power and Energy Systems 103 (2018) 634–643
(NN) perform on average 66% better than linear methods, including Table 1
linear regression (LR) and SVR with linear kernels. This indicates a Valid data range and type.
complex relationship between the data features and PMV factors. Feature Min Max Type
2. The performance of NNs depends on the network configuration. We
found an optimal configuration with 7.2% higher PMV modeling HVAC1 On/Off 0 1 bool
HVAC2 On/Off 0 1 bool
accuracy compared with the default configuration. Furthermore, the
HVAC1 Setpoint 15 50 float
optimal configuration can be trained very quickly, in only a few HVAC2 Setpoint 15 50 float
seconds.
Outdoor Temperature 15 50 float
3. The ML-based solution using NNs achieves high modeling accuracy, Outdoor Humidity 10 100 float
with an average error of only 1.3% in predicted PMV comfort levels. Outdoor Irradiance −50 1,400 float
Additionally, we show using the NN models that the PMV score is Outdoor Illuminace 0 200 float
linearly related to the HVAC setpoint, given knowledge of other Rain 0 1 bool
non-controllable factors such as date, time and weather conditions. Month 1 12 integer
Using the linear model, we can efficiently find the optimal, least Weekday 1 7 integer
Hour 1 24 integer
energy-intensive HVAC setpoint achieving a desired PMV comfort
Minute 1 60 integer
level.
Indoor Temperature 15 45 float
Indoor Humidity 10 100 float
The rest of this paper is organized as follows. We review related Indoor MRT 15 50 float
works on comfort modeling in Section 2. We describe our dataset and
present the model architecture in Section 3. In Section 4, different ML
algorithms are used to model the relationship between the dataset and 3.1. Dataset
the PMV factors. In Section 5, we compare the models and show that
NNs achieve the best performance. We also show how the NN model The data resolution is one minute, meaning that a data sample was
can be used for smart building control. Finally, Section 6 concludes the recorded every minute. Each data sample has four major feature sets,
paper and suggests future works. including the date and time (datetime), weather conditions, HVAC
settings and indoor data. The first three sets of data are used as the
dependent features of the ML algorithms, while the indoor data values
2. Related works are the target features the algorithms try to learn. Details of each fea-
ture set are shown in Table 1, and described below.
Thermal comfort modeling has been studied extensively in the past
decades. The most well-known thermal comfort model is the PMV 3.1.1. Datetime data
model proposed by Fanger et al. [6] and adopted in the ASHRAE Each data sample is associated with a timestamp including the date
Standard 55 [7]. PMV assumes that thermal comfort is determined by and time. Since ML algorithms typically deal with numerical values
six factors, including four environmental factors (indoor temperature, rather than a datetime string, the timestamp is broken into four num-
MRT, indoor humidity and air velocity) and two vital factors (metabolic bers, including month, weekday, hour and minute. Weekday is a value
rate and clothing insulation). Based on these factors, a set of equations ranging from 1 to 7 representing the days of the week.
are given to derive a thermal comfort score ranging from −3 to 3. The
seven integers within this range can be interpreted in ascending order
3.1.2. Weather data
as indicating cold, cool, slightly cool, neutral, slightly warm, warm and
The weather data has five features, including the outdoor tem-
hot, respectively. Based on the PMV score, a thermal comfort index
perature in Celsius (°C), the percentage of outdoor humidity, outdoor
called predicted percentage of dissatisfied (PPD) can also be computed.
irradiance in the Watt per square meter (W/m2), outdoor illuminance in
The PPD value ranges from 0% to 100%, where small values indicate
lumens per square meter (Lux), and the rain status, either true or false.
great comfort. ASHRAE Standard 55 recommends that PMV and PPD
should be within ± 1.0 and ≤ 20%, respectively, to meet basic occupant
thermal comfort needs. 3.1.3. HVAC data
In addition to the PMV model, ML offers another way to model The testbed office room has two fan coil units. The status of each
thermal comfort. Megri et al. in [11] applied the ∊-SVR algorithm to the unit, either on or off, was monitored and denoted using a Boolean
six PMV factors to predict the PMV comfort score. Using 793 data value. Also, the setpoint temperature of each unit in degrees Celsius was
samples for training and 18 samples for testing, they achieved a mod- recorded as a floating point number. In total, each HVAC datapoint has
eling accuracy of up to 99%. Atthajariyakul and Leephakpreeda [12] four values, HVAC1 On/Off, HVAC2 On/Off, HVAC1 Setpoint and
also modeled PMV scores, but used a NN model with two hidden layers. HVAC2 Setpoint. The four HVAC features are the only controllable
Their results showed very high modeling accuracy. features in our system.
The main difference between the past works and ours is that earlier
model was parameterized by the PMV factors, whose values are not 3.1.4. Indoor data
directly controllable. To be of use for a smart building BMS, it is crucial Three indoor features are recorded in the dataset. The first two are
that a thermal comfort model is based on parameters that the BMS can the indoor temperature in Celsius and the percentage of indoor hu-
directly control [13,14]. In this paper, we first model the relationship midity. The last feature is the MRT, measured in degrees Celsius. MRT
between the controllable parameters and the PMV factors and then use models the heat exchange between an indoor object like the human
the predicted factors to compute comfort scores for BMS control and body and the indoor environment. Past studies have shown that MRT is
deployment. one of the most critical factors for thermal comfort [6,15].
3.1.5. Data preprocessing

3. Dataset and system architecture The collected data was often noisy due to the problems such as
faulty sensors or software bugs. The dataset was cleaned to ensure the
In this section, we first describe the dataset we use for training our recorded data was valid. The valid data range of each feature is shown
models, before describing the model architecture. in Table 1.
635
depends on factors such as the size of the dataset and the algorithm
used.
The reported values are based on the average of ten independent

runs.
4.2. NN-based models
We first consider models based on NNs. We will show that the NNs
model the PMV factors more accurately than other types of ML algo-
Fig. 1. System architecture of ML-based thermal comfort modeling. rithms.
4.2.1. NN configuration
3.2. System architecture
As we stated earlier, we train three models, each one dealing with a
separate PMV factor. We first study the effect of different NN config-
PMV is based on six thermal comfort factors. For a given building
urations on the accuracy of the models. The dataset contains 13 in-
and season, three of these factors, namely metabolic rate, clothing in-
dependent features, as shown in Table 1, and thus the network has 13
sulation and air velocity are largely constant or show little variation in
neurons in the input layer. Each network has one output neuron cor-
their average value throughout the day. Thus, we use fixed values for
responding to the PMV factor being modeled. The NN performance
these parameters. In this paper, we assume a typical summer office
depends on the configuration of its hidden layers. The two main para-
environment, and set the metabolic rate to 1.2, corresponding to
meters are the number of hidden layers and the number of neurons in
moderate office work [16–18]. The clothing insulation for summer is
each layer. Care must be taken to select these values. For example,
set at 0.57 [18,19] and the air velocity is assumed to be insignificant at
using too few hidden layers or too few neurons produces an overly
0.1 m/s. Thus, PMV comfort score is determined by three indoor PMV
simplistic model that cannot capture complex relationships between the
factors, indoor temperature, indoor humidity and MRT.
input features and output. On the other hand, using too many layers or
Fig. 1 shows the system architecture of the proposed ML-based
neurons may lead to poor generalization performance due to over-fit-
thermal comfort model. The system has a database which stores the
ting [21–23].
dataset described above. Data is fed into three separate ML algorithms,
to train models for the three PMV factors indoor temperature, indoor
4.2.2. Results for MAE and R2
humidity and MRT. Ground-truth values for these factors are given in
Fig. 2 show the performance results of different NN configurations.
the dataset and used to tune the ML models. After training, the models
Note that the reported MAE is derived from the normalized data, so the
can take in new data points, including both controllable HVAC features
actual MAE is also related to the value range of the predicted PMV
and non-controllable weather and datatime features, and predict values
factor. For example, the indoor temperature in our dataset ranges from
for the indoor temperature, indoor humidity and MRT. These, along
22.5 to 32.3°C, so the actual mean error is 0.035(32.3–22.5) , or around
with values for the three fixed PMV parameters, are then input into the
0.3 °C, given a reported MAE of 0.035 as shown in Fig. 2(a).
PMV model to predict a thermal comfort level.
Several findings can be seen from the figures. First, regardless of
whether one or two hidden layers are used, performance improves
4. ML-based models for Indoor PMV factors
when the first hidden layer has more neurons. Second, the configura-
tions with two hidden layers perform better than one hidden layer in
In this section, we discuss the performance of the ML models for the
most cases. Similar to the first hidden layer, more neurons in the second
three indoor PMV factors, namely indoor temperature, indoor humidity
hidden layer also leads to improved performance.
and MRT. We perform experiments to study both the effectiveness of
These results show that our application and dataset are best mod-
different types of ML algorithms and impact of different features.
eled using an NN with two hidden layers and a large number of neu-
rons, unlike some other applications where performance degrades when
4.1. Experimental settings
using vast numbers of neurons. Nevertheless, as more neurons are
added the improvement in performance eventually reaches a plateau.
We first introduce the experimental settings. The algorithms in this
Consider for example indoor humidity modeled using one hidden layer.
paper are all implemented in Python 3.6 using the scikit-learn li-
The MAE improves by 0.012 when increasing from 5 neurons to 50
brary [20]. Unless stated otherwise, the algorithms are all based on the
neurons. However, further doubling the number of neurons only im-
library’s default settings, i.e., one hidden layer with 100 neurons for the
proves performance by another 0.003. For two hidden layers with 100
default NN configuration. The algorithms are executed on a laptop
neurons in the first hidden layer, the MAE is 0.051 using ten neurons for
running 64-bit Windows 10 Pro on an Intel Core i5-7300U CPU and
the second hidden layer. Using 50 neurons to the second hidden layer,
using 16 GB of memory.
the mean MAE improves quite significantly. However, further in-
The dataset is normalized here so that the values of both the input
creasing the number of neurons to 100 results in only a slight im-
and target features are within the range [0, 1]. 90% of the data samples
provement.
are randomly selected for training the models, and the remaining 10%
For the R2 metric as shown in Fig. 2(d)–(f), similar findings as for
are for testing. The algorithm’s performance is evaluated based on the
the MAE metric can be observed.
following metrics.
• Mean absolute error (MAE). This metric measures the difference 4.2.3. Results for training time
The running time of the algorithms varies significantly due to sev-
between the ground truth in the test data and the predicted values. A
eral factors such as dataset and algorithm setting. In this part, the ex-
smaller value indicates better modeling performance.
• R2. This metric, also known as the coefficient of determination,
ecution time results of different NN configurations are reported. Despite
the finding that more neurons and double hidden layers lead to better
measures the goodness of fit of a model. The R2 value lies between 0
performance as presented in the above part, it is also interesting to
and 1, where higher values indicate better modeling performance.
• Time. We also measured the time to train our models. The time
investigate the relationship between performance and time.
Fig. 3 shows the time results for different NN configurations for
636
Fig. 2. The performance results of different NN hidden layer configurations. L1 and L2 in the legend denote one or two hidden layers, respectively. The number of
neurons varies between 5 and 100 in the first hidden layer, as shown in the x-axis. The results for less than 5 neurons are not reported, as the performance is poor. L2-
n means that the second layer has n neurons.
In the rest of the paper, the configuration with two hidden layers
with 100 neurons each is denoted as NN-OPT, and the default NN
configuration used in scikit-learn is denoted as NN.
4.2.4. Results for training data ratio

In this part, we study the performance sensitivity to the amount of
training data. The tests in the previous sections assume that 90% of the
data is used for training. We now discuss the performance of our
methods as we vary the ratio of training data between 10% and 90%;
the results are shown in Fig. 4. The MAE and R2 values generally de-
crease and increase, respectively, with additional training data. How-
ever, the benefit of additional data exhibits diminishing returns, and
there is negligible improvement in performance from using more than
70% of the data for training. This indicates that our models are able to
successfully generalize based on limited training data.
Fig. 3. Time used to the train the indoor temperature model using different
hidden layer configurations.
4.3. Comparing different ML algorithms
modeling indoor temperature. Seen from the figure, the time usage The previous section showed that the most complex NN configura-
scales almost linearly in the number of neurons. One reason may be that tion, NN-OPT, achieved very high accuracy. In this section, we look at
a larger model takes proportionately more time to execute, leading to the use of other types of ML algorithms to model the PMV parameters.
more significant overall training time. Another possible reason could be We consider the following algorithms.
that the convergence of the neural network takes longer when its
configuration is more complicated, i.e., when there are more neurons in
the hidden layers.
• LR. Linear regression (LR) is one of the most widely used algorithms
in data analysis, being both efficient and easy to use. The basic as-
Despite the fact that training large models requires much time, the sumption in this model is that there is a nearly linear relationship
total time used is relatively short, even for the most complex config- between the training and target features. Thus, the performance of
urations. For example, for the neural network with two hidden layers an LR model on our data is a good indication of the complexity of
and 100 neurons in each layer, the total training time for the indoor the PMV dataset.
temperature model was only 8.9 s. For the indoor humidity and MRT
models, the times were 8.1 and 4.1 s, respectively. Once the models are
• SVR. Another model we consider is support vector regression (SVR).
A key parameter for SVR is the kernel model. Here, both linear
trained, they can repeatedly be used to perform inference. Inference kernels and Gaussian Radial Basis Function (RBF) kernels are tested.
time is nearly instantaneous. Thus, the improved accuracy that comes The latter kernel can model nonlinear relationships. The two ver-
with the more complex NN configurations outweighs the small increase sions of SVR are denoted SVR-L and SVR-R, respectively.
in training time.
637
Fig. 4. The performance results with different training data ratio between 0.1 and 0.9.
For all the above-listed algorithms, we adopt the default settings in parameters such as the outdoor environment and HVAC settings.
scikit-learn. While these settings are not necessarily optimal for The results also show that NN and NN-OPT perform significantly
our dataset, they have been shown to achieve good performance on a better than the other ML models. For example, for indoor temperature,
wide range of applications [20]. Also, using default settings for these the MAE of NN-OPT, 0.035, is only about half of the 0.07 MAE for the
models makes for a fair comparison against the NN model, which is also three linear algorithms. There is likewise a factor of two improvement
based on scikit-learn’s default configuration. for MRT, and a 1.5× improvement in indoor humidity. The nonlinear
Fig. 5 shows the performance results of the different algorithms. Of SVR-R, while better than the linear algorithms in most cases, is still
the five tested algorithms listed above, LR and SVR-L are designed to poorer than NN and NN-OPT.
deal with linear relationships, while the other models can also capture Finally, we note that there is a significant performance difference
nonlinear relationships. Thus, we refer to LR and SVR-L as linear between the two different neural network configurations. Figs. 5(a) and
models, and SVR-R, NN and NN-OPT as nonlinear models. Fig. 5 shows (d) show that even NN-OPT’s worst performance when modeling tem-
that the nonlinear models always perform significantly better than the perature is better than NN’s best performance. This is also true for in-
linear models. For indoor temperature and indoor humidity, the three door humidity and MRT. Specifically, the average MAEs of NN and NN-
best models are all nonlinear. For MRT, the two best algorithms are NN OPT for indoor temperature are 0.042 and 0.035, respectively, so that
and NN-OPT. SVR-R was not as good as the linear regression but was NN-OPT is 20.0% better than NN. For indoor humidity and MRT, NN-
better than its linear counterpart SVR-L. Overall, the better perfor- OPT is respectively 15.6% and 12.5% better than NN.
mance of the nonlinear models indicates that indoor environmental
conditions have a complicated relationship concerning other
Fig. 5. Performance results of different ML algorithms, including LR, SVR-L, SVR-R, NN and NN-OPT.
638
Fig. 6. Ranking features based on the correlation of each feature to each PMV factor under a linear model. The weather-related, HVAC and datetime features are
marked in black, dark gray and light gray colors, respectively.
4.4. Feature impact outdoor humidity. This is reasonable as the HVAC system ingests out-
side air to condition the indoor environment. Other essential features
In previous sections, we looked at the performance of different ML include outdoor temperature, outdoor irradiance and outdoor illumi-
models as well as different configurations of NNs. However, in addition nance. Although rain status is still relatively unimportant compared
to algorithms, the data features also play an important role in the with other weather features, it has a higher effect for indoor humidity
models. In this section, we look at the relative impact of different fea- than indoor temperature, since air humidity can reach almost 100%
tures in the dataset. when raining, and post-rain humidity can remain at high levels due to
We will rank the 13 individual input features in the dataset based on evaporation.
a univariate linear regression test, specifically the f_regression For datetime features, the hour feature ranks high due to the general
function in scikit-learn. The function computes the correlation pattern that humidity decreases during nighttime and increases during
between each of the input features and a target such as indoor tem- the day. The month feature ranks higher than the weekday and minute
perature. A higher F-score indicates a feature has a greater influence on features, likely because rainfall follows a monthly pattern or seasonal
the target. patterns.
HVAC features have only a moderate effect on indoor humidity. The
HVAC setpoint feature is among the least correlated features. These
4.4.1. Feature ranking for indoor temperature
results may indicate a relative inability for the HVAC system to control
Fig. 6 shows the feature ranking for each of the thermal comfort
indoor humidity, partly because the system only tries to adjust hu-
factors based on the computed feature scores. Fig. 6(a) shows the results
midity when it reaches an unsatisfactorily high level, e.g., 65%.
for modeling indoor temperature. The HVAC status, namely HVAC1
On/Off and HVAC2 On/Off, has the highest impact on the modeling
performance. Meanwhile, the HVAC setpoints only have a moderate 4.4.3. Feature ranking for MRT
impact on indoor temperature, and their F-scores are lower than those The results for MRT, shown in Fig. 6(c) are similar to those for in-
of the HVAC status and also several weather features. Results such as door temperature, shown in Fig. 6(a), so a detailed discussion is
these are one of the motivations for this paper. In particular, they de- omitted.
monstrate that it is untrue to simply assume indoor temperature equals
the HVAC setpoint temperature and that many other factors have equal 5. Improved thermal comfort modeling and smart building
or greater impact. control
The weather features, namely outdoor illuminance, outdoor irra-
diance, outdoor humidity and outdoor temperature, also have a strong In the last section, algorithms are investigated and optimized at the
correlation with the indoor temperature. These features directly impact level of each thermal comfort factor. In this section, we move forward
the heat flow from the outdoor environment to the indoor environment. to compute the PMV/PPD thermal comfort scores based on the models
For example, high solar irradiance and outdoor temperature lead to for the thermal comfort factors, and comparative analysis will also be
higher heat penetration from the outdoors to indoor. The rain status carried out for such ML-based solution.
feature, however, does not exhibit a significant impact on indoor tem- In our experiments, 90% of the data samples are taken as the
perature. training data for training models. The rest 10% data samples are used as
Datetime features including hour, weekday, month and minute are test data. For each data sample, the PMV ground truth is computed
correlated with the indoor temperature, but the correlation is not as based on our monitored indoor data, including the values of indoor
strong as for the HVAC and weather features. Among the four datetime temperature, indoor humidity and MRT.
features, hour and weekday have a higher correlation. The likely reason Accordingly, the absolute error can be computed for each compar-
is that the hour feature is correlated with weather features such as ison solution and each data sample as the difference between the
outdoor temperature, and the weekday value indicates whether a day ground truth and the thermal comfort score obtained from the com-
falls within the weekdays or the weekend, during which the HVAC parison solution. Given a solution A and a data sample x, let us denote
setting may be different. the corresponding absolute error as ε (A, x ) .
Also, the metric called improvement is introduced for comparison
4.4.2. Feature ranking for indoor humidity analysis. Given a solution A and a data sample x, the solution’s im-
Fig. 6(b) shows the feature rankings for indoor humidity. One main provement over the comparison solution B is defined as
difference from the results for indoor temperature is that weather and ε (B, x )
datetime features are more critical for humidity, whereas HVAC and −1
ε (A, x ) (1)
weather features have a more significant effect on temperature. As
shown in the figure, the most correlated feature for indoor humidity is Note that an improvement higher than 0 means solution A performs
639
Fig. 7. The sample data in our dataset’s first day. (a) The HVAC’s setpoint temperature, which is 25 °C throughout the day. The monitored values of indoor
temperature is also shown in the figure. (b) The monitored indoor humidity of the day and the line indicating the indoor humidity of 50% is also included.
better than B, and vice verse. with the default assumptions could be only 2.0−1.5 = 0.5, which in-
dicates a slightly warm or even comfortable thermal sensation.
5.1. Modeling accuracy with different modeled factors The four ML-based solutions using NN-OPT perform much better
than DEFAULT. Among the four solutions, NN-HM and NN-TH perform
In this part, the PMV/PPD modeling accuracy for different com- relatively worse. The absolute errors for those two solutions are mostly
parison solutions will be investigated. The solutions are introduced as around 0.5, while the error level is less than 0.2 for the rest two solu-
follows. tions, NN-TM and NN-ALL. The reason could be that PMV/PPD is more
sensitive to the indoor temperature and MRT, and is less affected by
5.1.1. Comparison solutions indoor humidity. Thus, when either indoor temperature or MRT is not
Three PMV factors are modeled in the proposed solution. Here, let well modeled, the thermal comfort modeling performance suffers.
us investigate the performance impact of modeling none or part of the Indoor humidity modeling does not affect the results too much, com-
three factors. First, we consider the comparison solution without pared to merely assuming a 50% indoor humidity. Nevertheless, the
modeling any factor. Such solution is suitable for the environments lack performance improves when indoor humidity is also modeled. For NN-
of the information technology support. The solution assumes that both TM and NN-ALL, the average absolute error values are 0.14 and 0.10,
indoor temperature and MRT are equivalent to the HVAC setpoint respectively, where the later is better.
temperature, and the indoor humidity is at a moderate level of 50%
[24,25]. Here, this solution is referred as DEFAULT. 5.1.3. Results of NN-ALL’s improvement
Although DEFAULT can compute the thermal comfort score by only Fig. 9 shows the improvement of NN-ALL compared with the other
knowing the HVAC setpoint temperature, the assumptions it makes may four solutions discusses above. As shown in the Fig. 9(a), NN-ALL
be far from the ground-truth values. Fig. 7 shows the sample data on a performs better than all the comparison solutions for almost all the data
working day and the first day in our dataset. The indoor temperature of samples regarding the PMV accuracy improvement. The median im-
the day seldom goes below the setpoint temperature, and exhibits over provement over DEFAULT is almost 18×. Similar to the above results
1 °C difference during specific periods. Assuming a 50% indoor hu- on the absolute error, NN-ALL has a more considerable advantage over
midity is also not reasonable. As shown in Fig. 7(b), the indoor hu- NN-HM and NN-TH, with the median improvement of 4.0× and 6.7×,
midity varies significantly throughout the day between 77% in the respectively. NN-TM without modeling indoor humidity achieves closer
morning and 56% in the afternoon. Also, the changes are not mono- performance compared with NN-ALL. The median improvement of NN-
tonic. ALL over NN-TM is 68%. Although not as remarkable as for NN–HM
Different from DEFAULT, ML-based solutions model and predict and NN-TH, the improvement is still significant. Similar observations
thermal comfort factor, rather than making assumptions, for calculating can be seen from Fig. 9(b) also, which shows the results of improvement
PMV/PPD. Here, NN-OPT is selected for modeling the factors. The so- for PPD modeling accuracy.
lution models all the three PMV factors is denoted NN-ALL. Except for Overall, the results show that the ML-based solutions can improve
NN-ALL, three variants of it are also considered. For the first variant, the thermal comfort modeling performance significantly compared to
denoted as NN-HM, the indoor temperature is not modeled using model the typical calculation method. Even if only part of the thermal comfort
and assumed to be equivalent to the HVAC setpoint temperature as in factors is modeled, the improvement is remarkable. Nevertheless, ML-
DEFAULT. The rest two factors, indoor humidity and MRT are modeled based modeling should be adopted on all the three thermal comfort
using NN-OPT as in NN-ALL. The second variant NN-TM uses NN-OPT factors to achieve the optimal PMV/PPD modeling accuracy.
for modeling indoor temperature and MRT, while the indoor humidity
is set to a fixed value of 50%. The third variant NN-TH models indoor 5.2. Modeling accuracy with different ML models
temperature and indoor humidity using NN-OPT and the MRT is set to
the same value of indoor temperature. The above part studies the performance impact of modeling dif-
ferent thermal comfort factors using NN-OPT. It turns out that using
5.1.2. Results of absolute error NN-OPT on all the three thermal comfort factors helps to achieve the
Fig. 8 shows the absolute error PMV/PPD results of the above-in- optimal performance for thermal comfort modeling. In this part, to-
troduced solutions. Seen from the figures, DEFAULT fails to perform gether with NN-OPT, other ML models, including LR, SVR-L, SVR-R and
well, where the error is over 1.5 for the majority of the test data sam- the default NN, are investigated regarding their performance for
ples. Such value is too much. For example, given a ground truth PMV thermal comfort modeling. Individually, all the three thermal comfort
value of 2.0, which means uncomfortably hot, the derived PMV value factors are modeled using the each of the above algorithms. Two
640
Fig. 8. The absolute error results of the PMV and PPD metric with five solutions, including DEFAULT, NN-HM, NN-TM, NN-TH and NN-ALL.
Fig. 9. The improvement results of NN-ALL compared with four different solutions, including DEFAULT, NN-HM, NN-TM and NN-TH.
Fig. 10. The absolute error results of five ML algorithms, including LR, SVR-L, SVR-R, NN and NN-OPT, for thermal comfort modeling.
performance metrics, including absolute error and NN-OPT’s improve- is as high as 1.7 for DEFAULT but does not exceed 0.2 for the ML-based
ment, are considered for investigated the modeling accuracy of the al- solutions. The results reveal that ML offers a promising way to improve
gorithms. The details are shown as follows. the thermal comfort modeling accuracy.
Similar to the observations for thermal comfort factor modeling, the
5.2.1. Results of absolute error linear algorithms usually cannot outperform the nonlinear ones. In
Fig. 10 shows the thermal comfort modeling accuracy regarding the Fig. 10, the linear ones LR and SVR-L have a higher median absolute
absolute error of using different ML algorithms. error, above 0.15, compared to the nonlinear ones SVR-R, NN and NN-
First, let us compare the results in Figs. 10 and 8. As we can see, OPT, where the values are below 0.10. The PPD results exhibit the si-
although performance difference exists for different algorithms, the ML- milar pattern, where the median errors are around 2.0 for the linear
based solutions all perform better than DEFAULT, which computes ones and 1.2, 1.1 and 1.0 for the nonlinear ones. Also, as we can see,
PMV/PPD based on setpoint temperature and a few assumptions rather NN-OPT has the best performance among the three nonlinear algo-
than modeling the thermal comfort factors. The median absolute error rithms.
641
Fig. 11. The results of NN-OPT’s improvement compared with four ML algorithms, including LR, SVR-L, SVR-R and NN, for thermal comfort modeling.
Fig. 12. The PMV scores for different HVAC setpoint temperature at 10:00 and 15:00 on two working days. The black dot corresponds to the setpoint with a 1.0 PMV
score.
5.2.2. Results of NN-OPT’s improvement 5.3. ML-based modeling for smart building control
The NN-OPT’s performance is compared to the four considered al-
gorithms, and the results are shown in Fig. 11. In this section, we look at how our PMV models can be used for
Seen from the figure, NN-OPT is significantly better than LR, SVR-L smart building control. Given the outdoor environmental conditions
and SVR-R, with the median improvement of 86%, 67% and 36%, re- and date and time, we can test the effect of a variety of settings of the
spectively. For NN with the default configuration, NN-OPT outperforms controllable HVAC features on the PMV factors indoor temperature,
by 7%, a relatively insubstantial advantage. indoor humidity and MRT. Using these values, we can subsequently
Overall, the results reveal the following facts. First, ML-based so- compute the PMV comfort score. This gives a building management
lutions can improve the thermal comfort modeling performance sig- system a way to find the lowest, and consequently most inexpensive
nificantly. Second, nonlinear algorithms are more suitable to our ap- HVAC settings to achieve the desired comfort level. We assume in this
plication and dataset than the linear counterparts. Third, there is space section that both HVAC units are turned on and have the same setpoint
to improve the performance of the algorithm, and the algorithm opti- temperature.
mization is necessary. Fig. 12 shows the PMV score for different HVAC setpoint
642
temperatures between 23 and 27 °C given different datetime and We would also like to incorporate other data features such as occupancy
weather conditions. Fig. 12(a) shows the results at 10:00 on the first and building location. Finally, we want to explore correlated factors,
day in our dataset. As we can see, the PMV score increases with the e.g., building lighting, to reduce building energy costs.
HVAC setpoint temperature. Higher setpoints also reduce the air con-
ditioning load in the tropical areas. To improve building energy effi- References
ciency, the setpoint should be configured to a value such that the
thermal sensation is slightly warm, corresponding to a PMV score of 1.0 [1] Farmani F, Parvizimosaed M, Monsef H, Rahimi-Kian A. A conceptual model of a
according to ASHRAE Standard 55. smart energy management system for a residential building equipped with cchp
system. Int J Electr Power Energy Syst 2018;95:523–36.
Our tests assume that the indoor air velocity, occupant metabolic [2] Shaikh PH, Nor NBM, Nallagownden P, Elamvazuthi I, Ibrahim T. Intelligent multi-
rate and cloth insulation are unchanged. Under these settings, the PMV objective control and management for smart energy efficient buildings. Int J Electr
score is almost linearly proportional to the HVAC setpoint, given a Power Energy Syst 2016;74:403–9.
[3] Vakiloroaya V, Samali B, Fakhar A, Pishghadam K. A review of different strategies
particular datetime and weather conditions. We can exploit this line- for hvac energy saving. Energy Convers Manage 2014;77:738–54.
arity property to compute the maximum HVAC setpoint to achieve a [4] Höppe P. Different aspects of assessing indoor and outdoor thermal comfort. Energy
particular PMV score. In particular, for a given datetime and weather Build 2002;34(6):661–5.
[5] Dear R, Akimoto T, Arens E, Brager G, Candido C, Cheong K, Li B, Nishihara N,
condition setting, we can evaluate just two HVAC setpoints to derive Sekhar S, Tanabe S, et al. Progress in thermal comfort research over the last twenty
the linear relationship between setpoints and PMV, and then solve this years. Indoor Air 2013;23(6):442–61.
linear equation for the setpoint achieving a PMV of 1.0. We im- [6] Fanger PO, et al. Thermal comfort. analysis and applications in environmental en-
gineering.
plemented this scheme in Python, and the calculations are nearly in-
[7] ASHRAE Standard. Standard 55-2010: thermal environmental conditions for human
stantaneous. occupancy; Ashrae, Atlanta USA.
As an example of this approach, consider two times, 10:00 and [8] Luo Y, Liu T, Tao D, Xu C. Decomposition-based transfer distance metric learning for
15:00 on a working day, as shown in Fig. 12(a) and (b). To achieve a image classification. IEEE Trans Image Process 2014;23(9):3789–801.
[9] Luo Y, Wen Y, Tao D. Heterogeneous multitask metric learning across multiple
PMV of 1.0, the HVAC setpoints should be 25.2 and 25.1°, respectively. domains. IEEE Trans Neural Netw Learn Syst PP 2018(99):1–14.
The reason for the higher setpoint at 10:00 compared to 15:00 is that [10] Luo Y, Wen Y, Liu T, Tao D. General heterogeneous transfer distance metric learning
the outdoor temperature at 10:00 is 28.6°, while the temperature at via knowledge fragments transfer. Proceedings of the 26th International Joint
Conference on Artificial Intelligence. AAAI Press; 2017. p. 2450–6.
15:00 is 29.4°. We also performed this test on another working day and [11] Megri AC, Naqa IE, Haghighat F. A learning machine approach for predicting
found that the setpoints at 10:00 and 15:00 should be 23.9 and 24.6°, thermal comfort indices. Int J Ventilation 2005;3(4):363–76.
respectively, as shown in Fig. 12(c) and (d). Here, although the outdoor [12] Atthajariyakul S, Leephakpreeda T. Neural computing thermal comfort index for
hvac systems. Energy Convers Manage 2005;46(15–16):2553–65.
temperature and humidity are quite similar at 10:00 and 15:00, there [13] Lai CS, Lai LL. Application of big data in smart grid. Systems, Man, and Cybernetics
was significant variation in the outdoor irradiance and outdoor illu- (SMC), 2015 IEEE International Conference on. IEEE; 2015. p. 665–70.
minance, which were 273.8 and 29.4 at 10:00, but 240.8 and 27.2 at [14] Leung M, Norman C, Lai LL, Chow TT. The use of occupancy space electrical power
demand in building cooling load prediction. Energy Build 2012;55:151–63.
15:00. As a result, less heat penetrated through the office windows at [15] Palmer JM, Chapman KS, Watson RD. Handbook of radiant heating and cooling
15:00, leading to a higher permissible setpoint. [Tech. rep.]; 2017.
[16] Brager GS, de Dear R. A standard for natural ventilation. ASHRAE J
2000;42(10):21.
6. Conclusion
[17] The Engineering Toolbox. Met – metabolic rate; 2018. URL:https://www.
engineeringtoolbox.com.
In this paper, we revisit thermal comfort modeling from the ML [18] Schiavon S, Lee KH. Dynamic predictive clothing insulation models based on out-
perspective. Unlike earlier models based on non-controllable environ- door air and indoor operative temperatures. Build Environ 2013;59:250–60.
[19] McCullough EA, Jones BW, Huck J. A comprehensive data base for estimating
mental factors, our work captures the effect of controllable building clothing insulation. Ashrae Trans 1985;91(2):29–47.
parameters on PMV comfort levels. Some different models and config- [20] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M,
urations are evaluated in this paper. We showed that nonlinear models Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python.
J Mach Learn Res 2011;12:2825–30.
performed significantly better than linear ones. We also found that NNs [21] Thomas AJ, Walters SD, Gheytassi SM, Morgan RE, Petridis M. On the optimal node
had the best performance and that a network with two hidden layers ratio between hidden layers: a probabilistic study. Int J Mach Learn Comput
and 100 neurons in each layer was substantially more accurate than the 2016;6(5):241.
[22] Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR. Improving
default NN configuration. The average test error was only 1.3%. Also, neural networks by preventing co-adaptation of feature detectors. arXiv preprint
training time was minimal, indicating the practicality of our approach. arXiv:1207.0580.
Finally, we showed that our NN model shows a linear relationship be- [23] Lawrence S, Giles CL, Tsoi AC. What size neural network gives optimal general-
ization? Convergence properties of backpropagation [Tech. rep.]; 1998.
tween HVAC setpoints and the PMV value, leading to a simple and ef-
[24] Lin T-P, Matzarakis A, Hwang R-L. Shading effect on long-term outdoor thermal
ficient algorithm to find the optimal setpoint meeting a given comfort comfort. Build Environ 2010;45(1):213–21.
requirement. [25] Kramer RP, Schellen HL, van Schijndel J. Towards temperature limits for museums:
a building simulation study for four museum zones with different quality of en-
In the future, we plan to extend this work by including other factors
velopes. In: Proceedings of Healthy Buildings Europe (Eindhoven, 2015).
affecting building energy consumption in addition to HVAC settings.
643

Electrical Power and Energy Systems: Wei Zhang, Fang Liu, Rui Fan

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Electrical Power and Energy Systems: Wei Zhang, Fang Liu, Rui Fan

Uploaded by

Copyright:

Available Formats

Electrical Power and Energy Systems 103 (2018) 634–643

Contents lists available at ScienceDirect

Electrical Power and Energy Systems

Improved thermal comfort modeling for smart buildings: A data analytics T

1. Introduction building, the building management system (BMS) predicts thermal

3.1.5. Data preprocessing

The reported values are based on the average of ten independent

4.2. NN-based models

4.2.4. Results for training data ratio

You might also like