Professional Documents
Culture Documents
TsoEtal2007 - Predicting Electricity Energy Consumption
TsoEtal2007 - Predicting Electricity Energy Consumption
Abstract
This study presents three modeling techniques for the prediction of electricity energy consumption. In addition to the traditional
regression analysis, decision tree and neural networks are considered. Model selection is based on the square root of average squared
error. In an empirical application to an electricity energy consumption study, the decision tree and neural network models appear to be
viable alternatives to the stepwise regression model in understanding energy consumption patterns and predicting energy consumption
levels. With the emergence of the data mining approach for predictive modeling, different types of models can be built in a unified
platform: to implement various modeling techniques, assess the performance of different models and select the most appropriate model
for future prediction.
r 2006 Elsevier Ltd. All rights reserved.
Keywords: Data mining; Decision tree; Electricity energy consumption; Neural networks; Regression analysis
0360-5442/$ - see front matter r 2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.energy.2006.11.010
ARTICLE IN PRESS
1762 G.K.F. Tso, K.K.W. Yau / Energy 32 (2007) 1761–1768
described in the following section. Methodologies and with the electricity consumption level and population
results are described, respectively, in Sections 3 and 4. size. A proportionate stratification sampling method
Comparative advantages of the different data analysis was adopted to select households for higher statistical
approaches in application to electricity energy consump- efficiency.
tion and discussion of results are given in the last section.
Results arising from this study provide important reference
2.3. Housing type, household characteristics and appliance
materials for the utility companies in assessing electricity
ownership factors
energy consumption patterns and selecting a more accurate
approach to estimate future energy demand.
The following housing type, household characteristics
and appliance ownerships are captured from our data-
2. Background to the study
bases.
2.1. Two-phase energy consumption survey Housing type:
As discussed above, the housing type was stratified into
A two-phase survey was carried out in the summer and four groups. Public rental is defined as the base category,
winter of 1999–2000. The survey covers domestic house- and three dummy variables are created to indicate the
holds with average monthly electricity consumption of effect of housing type in contrast with the base type.
100 kWh or above. A questionnaire–diary survey method HOS Government subsidized home ownership
was applied to collect details of appliances’ ownership scheme
levels and power ratings amongst participating households PD Private development
in both the summer (June–August, mean high and low VH Village house
temperature: 31.3 and 26.7 1C; mean relative humidity:
Household characteristics:
80.7%) and winter (February–early March, mean high and
AGE Age of the flat in years
low temperature: 17.4 and 13.6 1C; mean relative humidity:
SIZE Size of the flat in ft2
80.5%) phase surveys.
RENT A dummy variable to indicate rental flat
During an on-site interview, interviewers first record the
INCOME Monthly household income in HK$1000
number of different appliances in the household, including
MEMBER Number of household members
their models and power ratings. A diary was then used to
record usage pattern of selected major appliances over Appliance ownership:
every half-hour interval for one week. Among the 1516 AC Ownership of air-conditioner
households interviewed, 1201 and 1000 diaries were FAN Ownership of fan
collected from the summer and winter phases, respectively. CD Ownership of clothes dryer
Of the 1201 records collected from summer phase inter- CW Ownership of clothes washing machine
view, only 1166 were complete records as there were values DEH Ownership of dehumidifier
missing in either the AGE or INCOME variables of the EWH Ownership of electric water heater
other records. EK Ownership of electric kettle
After collecting the completed questionnaires and RH Ownership of rangehood
diaries, a database was developed containing power ratings VFAN Ownership of ventilation fan
of appliances and consumption time for each end-use at
different time intervals. The data collected were scrutinized
using quality control procedures such as telephone follow-
up calls to verify the operating hours on any extreme 3. Data analysis methodologies
values. The approximate energy consumption for air-
conditioning, lighting, washing and drying clothes, dish Predictive modeling tries to find good rules for predict-
washing, water boiling, refrigerating, cooking, water ing the values of one or more variables in a data set
heating and dehumidifying were calculated by applying (outputs) from the values of other variables in the data set
relevant computational formulae [2]. (inputs). The three common predictive modeling techni-
ques are multiple regression, neural network and decision
2.2. Stratification according to housing type tree models. The algorithms developed in these modeling
techniques arise from methodological research in various
It is believed that the variations in electricity consump- disciplines, including statistics, pattern recognition, and
tion vary between different housing types. The target machine learning. These three techniques are applied to
households were therefore stratified into 4 different analyze the energy consumption data described in Section
housing types in the survey: public rental (38%), Govern- 2. SAS Enterprise Miner is used, in a unified setting, for
ment subsidized home ownership scheme (15%), private multiple regression, neural network and decision tree
development (41%) and village housing (6%). The model building. A schematic presentation of the data
stratification proportion was determined in accordance analysis flow is shown in Fig. 1.
ARTICLE IN PRESS
G.K.F. Tso, K.K.W. Yau / Energy 32 (2007) 1761–1768 1763
Output
S Transfer function
Active function
Σ Combination function
W1 W3
W2
Inputs
differences: (1) the branch-merging option for nominal 4.1.1. Summer phase data
splits is the default; (2) misclassification costs can be The stepwise regression model results are given in Table
specified; (3) boosting and cross-validation are available; 1 (number of observations ¼ 1116; number of significant
and (4) the algorithm for creating rule sets from trees is variables entered in the model ¼ 6). In an ANOVA table,
much improved. the F-statistic ¼ 68.90 (p-valueo0.0001), indicating the
A major advantage of the decision tree over other results of the regression model are satisfactory. The flat
modeling techniques is that it produces a model which may size, number of family members and the ownership of air-
represent interpretable rules or logic statements. The conditioner, clothes dryer, rangehood and ventilation fan
explanation capability that exists for trees producing axis are important factors influencing the electricity consump-
parallel decision surfaces is an important feature [11]. tion. Notice that all the first 3 factors relate to the
Besides, classification can be performed without compli- electricity consumption due to air-conditioning. The sub-
cated computations and the technique can be used for both tropical climate in Hong Kong generally contributes to this
continuous and categorical variables. Furthermore, deci- phenomenon. The average temperature and humidity
sion tree model results provide clear information on the during summer in Hong Kong are both high and air-
importance of significant factors for prediction or classi- conditioning end-use represents 59% of the total electricity
fication. consumption in a typical summer month [2].
However, decision tree induction generally does not
perform as well as neural networks for nonlinear data, and
it is susceptible to noisy data [10]. In general, the technique 4.1.2. Winter phase data
is more suitable for predicting categorical outcomes and, The stepwise regression model results are given in
unless visible trends and sequential patterns are available, Table 2 (number of observations ¼ 1000; number of
decision trees are less appropriate for application to time significant variables entered in the model ¼ 5). In an
series data. ANOVA table, the F-statistic ¼ 66.21 (p-valueo0.0001),
In our energy consumption model, an F-test is applied indicating the results of the regression model are satisfac-
for tree classification. The F-test criterion with a signifi- tory. The number of family members, private development,
cance level of 0.2, a default in SAS Enterprise Miner [13], is village house, ownership of electric water heater and
selected. In order to prevent the tree from growing too rangehood are found to be important factors influencing
large, the number of observations required for a split electricity consumption. While many households do not
search is set at 50. In addition, missing values are not operate their air-conditioner during winter, water-heating
counted as an acceptable value in our decision tree models. end-use takes up a major proportion of total electricity
consumption. In Hong Kong, the type of housing
3.4. Model selection criteria determines whether electric or gas water heaters are used.
Table 2 Table 3
Regression model for electricity energy consumption in winter (stepwise) Neutral network model for electricity energy consumption in summer
N 1166
Average 76.92
Member
<3.5 >=3.5
N 536 N 630
Average 60.49 Average 90.91
1 0 <817 >=817
N 510 N 26 N 569 N 61
Average 62.97 Average 11.69 Average 86.36 Average 133.33
Member AC own
<2.5 >=2.5 1 0
Fig. 3. Decision tree model for electricity energy consumption (in kWh) in summer.
ARTICLE IN PRESS
G.K.F. Tso, K.K.W. Yau / Energy 32 (2007) 1761–1768 1767
N 1000
Average 41.06
EWHown
1 0
N 314 N 686
Average 72.71 Average 26.57
Member Flatsize
Fig. 4. Decision tree model for electricity energy consumption (in kWh) in winter.
Table 5
RASE of decision tree, neural network and regression models
RASE (kWh)
Decision tree Neural network Regression (stepwise) Regression (full) Regression (intercept)
Table 6
Significant factors influencing energy consumption
Summer Winter
Decision tree Neural network Regression (stepwise) Decision tree Neural network Regression (stepwise)
HOS *
PD * *
VH * *
AGE
SIZE * * * *
RENT
INCOME *
MEMBER * * * * * *
AC * * *
FAN
CD * *
CW *
DEH
EWH * * *
EK
RH * * * * *
VFAN * *
ARTICLE IN PRESS
1768 G.K.F. Tso, K.K.W. Yau / Energy 32 (2007) 1761–1768
network model performs slightly better than the other two of conducting data analysis. Data mining is now receiving
alternatives. For comparative purposes, a summary of the plenty of attention and is being recognized as a newly
significant factors identified in these models is given in emerging analysis tool. When searching for a predictive
Table 6. In this empirical application to an electricity model, common practice in data mining is to develop
energy consumption study, the decision tree and neural various models using different approaches, then select a
network models appear to be viable alternatives to the final model after comparing their accuracies according to
stepwise regression model in understanding energy con- some model selection criteria. This study illustrates how
sumption patterns and in predicting energy consumption this concept can be used to predict electricity energy
levels. consumption in Hong Kong. When comparing accuracy in
predicting electricity energy consumption, it is found that
5. Discussion the decision tree model and neural network model perform
slightly better than other models in the summer and winter
In summer phase, the decision tree model resulted in phases, respectively. In general, the differences in RASE
fewer numbers of significant factors influencing energy between the three types of model are quite small, indicating
consumption than neural network and stepwise regression. the three modeling techniques are generally comparable in
In other words, the decision tree model, with its simpler predicting energy consumption.
structure, is more accurate than other models. Three
factors, namely flat size, number of members in the Acknowledgments
household, and ownership of air-conditioner are found to
be significant in all three models using the decision tree, The authors would like to thank the referees for helpful
neural network, and stepwise regression. This phenomenon comments on an earlier version of the paper.
is reasonable because air-conditioning consumes on aver-
age 59% of the electricity in a typical household in Hong References
Kong during the summer [2], and the number of air-
conditioners is associated with flat size and number of [1] Lam JC. Climatic and economic influences on residential electricity
members in the household. In the winter phase, the housing consumption. Energy Convers Manage 1998;39:623–9.
[2] Tso GKF, Yau KKW. A study of domestic energy usage pattern in
type plays a role in the electricity energy consumption level.
Hong Kong. Energy 2003;28:1671–82.
Two other factors, ownership of electric water heater and [3] Al-Garni AZ, Zubair SM, Nizami JS. A regression model for electric
rangehood are found to be significant in influencing energy consumption forecasting in Eastern Saudi Arabia. Energy
electricity consumption in all three models. The number 1994;19:1043–9.
of household members is significant in all models, but flat [4] Yan YY. Climate and residential electricity consumption in Hong
size and ownership of air-conditioner are not significant Kong. Energy 1998;23:17–20.
[5] Ranjan M, Jain VK. Modelling of electrical energy consumption in
factors in the winter phase. As a remark, inclusion of Delhi. Energy 1999;24:351–61.
meteorological variables such as temperature and wind [6] Egelioglu F, Mohamada AA, Guven H. Economic variables and
velocity should improve the model fitting results. However, electricity consumption in Northern Cyprus. Energy 2001;26:355–62.
such information cannot be matched accurately with the [7] Kalogirou SA, Bojic M. Artificial neural networks for the prediction
existing database retrospectively. of the energy consumption of a passive solar building. Energy
2000;25:479–91.
Predicting energy consumption plays an important role [8] Müller W, Wiederhold E. Applying decision tree methodology for
in decision making and planning for utility companies. In rules extraction under cognitive constraints. Eur J Oper Res
the past, regression models were mainly adopted to predict 2002;136:282–9.
energy consumption [2–6]. The use of alternative analytical [9] Berry MJA, Linoff G. Data mining techniques for marketing, sales,
and customer support. New York: Wiley; 1997.
methods has not been popular in the energy consumption
[10] Curram SP, Mingers J. Neural networks, decision tree induction and
literature. While the regression analysis method is sup- discriminant analysis: an empirical comparison. J Oper Res Soc 1994;
ported by statistical theories as producing good estimates 45:440–50.
according to certain statistical properties, for instance, [11] Perner P, Zscherpel U, Jacobsen C. A comparison between neural
being the best linear unbiased estimator, other approaches networks and decision trees based on data from industrial radio-
such as decision tree and neural network are found useful graphic testing. Pattern Recognition Lett 2001;22:47–54.
[12] Quinlan JR. C4.5 programs for machine learning. San Mateo:
in developing predictive models in other fields. In the past Morgan Kaufmann; 1993.
decade, advancements in database management and [13] SAS Institute Inc. S/STAT user’s guide. Cary: SAS Institute
improvements in computing speed have lead to new ways Inc.; 2003.