Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

ARTICLE IN PRESS

Energy 32 (2007) 1761–1768


www.elsevier.com/locate/energy

Predicting electricity energy consumption: A comparison of regression


analysis, decision tree and neural networks
Geoffrey K.F. Tso, Kelvin K.W. Yau
Department of Management Sciences, City University of Hong Kong, Tat Chee Ave., Kowloon, Hong Kong, China
Received 15 August 2005

Abstract

This study presents three modeling techniques for the prediction of electricity energy consumption. In addition to the traditional
regression analysis, decision tree and neural networks are considered. Model selection is based on the square root of average squared
error. In an empirical application to an electricity energy consumption study, the decision tree and neural network models appear to be
viable alternatives to the stepwise regression model in understanding energy consumption patterns and predicting energy consumption
levels. With the emergence of the data mining approach for predictive modeling, different types of models can be built in a unified
platform: to implement various modeling techniques, assess the performance of different models and select the most appropriate model
for future prediction.
r 2006 Elsevier Ltd. All rights reserved.

Keywords: Data mining; Decision tree; Electricity energy consumption; Neural networks; Regression analysis

1. Introduction future energy demand. However, with the recent rapid


development of data mining, alternative estimation ap-
Energy consumption in Hong Kong has risen remark- proaches such as decision trees and neural networks have
ably in the past few decades due to the city’s increasing become more popular and easier to operate.
population and economic development. In particular, there Traditionally, regression analysis has been the most
is a substantial increase in the total annual electricity popular modeling technique in predicting energy consump-
energy consumption in the domestic sector, from tion [2–6]. Artificial neural networks have also been used
1059 GWh in 1971 to 5718 GWh in 1991 and to for the prediction of energy consumption, such as in the
9111 GWh in 2001 [1–2]. This reflects the high level of study by Kalogirou and Bojic [7]. Simulated data were used
electricity energy consumption and the increased demand to train an artificial neural network in order to generate a
for domestic energy consumption. mapping between the input and output, and the model was
Having a sufficient energy supply is vital for the subsequently used to predict energy consumption. In fact,
community. The utility companies are responsible for neural networks have established their role as data analysis
maintaining and improving their service continuously. To tools in different areas. Decision tree methodology has also
facilitate better planning, utility companies maintain been found to be an efficient decision support for a
databases that capture the energy consumption and usage production system [8]. Comparison of these different data
patterns of major appliances. These databases are then analysis and modeling techniques has been considered in
used to identify the trend and the usage of domestic energy. various applications, but rarely in energy consumption
Currently, the utility companies use various statistical prediction.
forecasting and regression approaches to estimate their This study compares the accuracy in predicting electri-
city energy consumption in Hong Kong among three
Corresponding author. Tel.: +852 2788 8568; fax: +852 2788 8560. different approaches: regression analysis, decision trees,
E-mail address: msgtso@cityu.edu.hk (G.K.F. Tso). and neural networks. The background of the study is

0360-5442/$ - see front matter r 2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.energy.2006.11.010
ARTICLE IN PRESS
1762 G.K.F. Tso, K.K.W. Yau / Energy 32 (2007) 1761–1768

described in the following section. Methodologies and with the electricity consumption level and population
results are described, respectively, in Sections 3 and 4. size. A proportionate stratification sampling method
Comparative advantages of the different data analysis was adopted to select households for higher statistical
approaches in application to electricity energy consump- efficiency.
tion and discussion of results are given in the last section.
Results arising from this study provide important reference
2.3. Housing type, household characteristics and appliance
materials for the utility companies in assessing electricity
ownership factors
energy consumption patterns and selecting a more accurate
approach to estimate future energy demand.
The following housing type, household characteristics
and appliance ownerships are captured from our data-
2. Background to the study
bases.
2.1. Two-phase energy consumption survey Housing type:
As discussed above, the housing type was stratified into
A two-phase survey was carried out in the summer and four groups. Public rental is defined as the base category,
winter of 1999–2000. The survey covers domestic house- and three dummy variables are created to indicate the
holds with average monthly electricity consumption of effect of housing type in contrast with the base type.
100 kWh or above. A questionnaire–diary survey method HOS Government subsidized home ownership
was applied to collect details of appliances’ ownership scheme
levels and power ratings amongst participating households PD Private development
in both the summer (June–August, mean high and low VH Village house
temperature: 31.3 and 26.7 1C; mean relative humidity:
Household characteristics:
80.7%) and winter (February–early March, mean high and
AGE Age of the flat in years
low temperature: 17.4 and 13.6 1C; mean relative humidity:
SIZE Size of the flat in ft2
80.5%) phase surveys.
RENT A dummy variable to indicate rental flat
During an on-site interview, interviewers first record the
INCOME Monthly household income in HK$1000
number of different appliances in the household, including
MEMBER Number of household members
their models and power ratings. A diary was then used to
record usage pattern of selected major appliances over Appliance ownership:
every half-hour interval for one week. Among the 1516 AC Ownership of air-conditioner
households interviewed, 1201 and 1000 diaries were FAN Ownership of fan
collected from the summer and winter phases, respectively. CD Ownership of clothes dryer
Of the 1201 records collected from summer phase inter- CW Ownership of clothes washing machine
view, only 1166 were complete records as there were values DEH Ownership of dehumidifier
missing in either the AGE or INCOME variables of the EWH Ownership of electric water heater
other records. EK Ownership of electric kettle
After collecting the completed questionnaires and RH Ownership of rangehood
diaries, a database was developed containing power ratings VFAN Ownership of ventilation fan
of appliances and consumption time for each end-use at
different time intervals. The data collected were scrutinized
using quality control procedures such as telephone follow-
up calls to verify the operating hours on any extreme 3. Data analysis methodologies
values. The approximate energy consumption for air-
conditioning, lighting, washing and drying clothes, dish Predictive modeling tries to find good rules for predict-
washing, water boiling, refrigerating, cooking, water ing the values of one or more variables in a data set
heating and dehumidifying were calculated by applying (outputs) from the values of other variables in the data set
relevant computational formulae [2]. (inputs). The three common predictive modeling techni-
ques are multiple regression, neural network and decision
2.2. Stratification according to housing type tree models. The algorithms developed in these modeling
techniques arise from methodological research in various
It is believed that the variations in electricity consump- disciplines, including statistics, pattern recognition, and
tion vary between different housing types. The target machine learning. These three techniques are applied to
households were therefore stratified into 4 different analyze the energy consumption data described in Section
housing types in the survey: public rental (38%), Govern- 2. SAS Enterprise Miner is used, in a unified setting, for
ment subsidized home ownership scheme (15%), private multiple regression, neural network and decision tree
development (41%) and village housing (6%). The model building. A schematic presentation of the data
stratification proportion was determined in accordance analysis flow is shown in Fig. 1.
ARTICLE IN PRESS
G.K.F. Tso, K.K.W. Yau / Energy 32 (2007) 1761–1768 1763

Output

S Transfer function
Active function
Σ Combination function

W1 W3
W2

Inputs

Fig. 2. Typical units in a neural network.


Fig. 1. Schematic presentation of the data analysis flow using SAS
Enterprise Miner.
process of setting the best weights on the inputs of each of
the units [9] and backpropagation (backprop) is the most
common method for computing the error gradient for a
3.1. Regression model building feedforward network.
Neural networks perform well in applications when the
Regression analysis is one of the most popular techni- functional form is nonlinear [10]. They are especially useful
ques for predictive modeling. A multiple regression model for prediction problems where mathematical formulae and
with more than one explanatory variable may be written as prior knowledge on the relationship between inputs and
y ¼ b0 þ b1 x1 þ b2 x2 þ . . . þ bp xp þ , outputs are unknown.
A disadvantage in using neural network for a regression
where y is the output variable, bi the regression parameters analysis is that it does not provide p-values for testing the
(i ¼ 0,1,2,y,p), xi the input variables (i ¼ 1,2,y,p) and e significance of the parameter estimates. Moreover, a
the random error term. preliminary step of feature selection before learning is
The least-squares method is generally used for estima- needed [11]. Artificial neural networks with hidden layers
tion purposes in the multiple-regression model. Once are better as classifiers for problems involving nonlinear
regression coefficients are obtained, a prediction equation decision hyper-surfaces, but are much harder to interpret.
can then be used to predict the value of a continuous Multilayer perceptron (MLP) is applied to predict the
output (target) as a linear function of one or more electricity consumption for our data sets. Linear combina-
independent inputs. The popularity of the regression tion functions and sigmoid transfer functions are used. The
models may be attributed to the interpretability of model S-shaped sigmoid function is by far the most common
parameters and ease of use. However, the major conceptual transfer function. The formula for the sigmoid is
limitation of all regression techniques is that one can only
1
ascertain relationship but can never be sure about under- SigmoidðxÞ ¼ .
lying causal mechanism. In our application to the predic- 1 þ ex
tion of electricity energy consumption, stepwise regression The number of hidden layers applied is determined by
models are used, with both the entry and stay points for the estimating the generalization error of each network. For
models set to 0.05. the current application to electricity energy consumption
data, it is found that one hidden layer network is adequate.
3.2. Neural network model building
3.3. Decision tree model building
Neural network models were originally developed by
researchers trying to mimic the neurophysiology of the In decision tree modeling, an empirical tree represents a
human brain. The models are analytic techniques modeled segmentation of the data that is created by applying a series
after the (hypothesized) processes of learning in the of simple rules. These models generate set of rules which
cognitive system and the neurological functions of the can be used for prediction through the repetitive process of
brain and capable of predicting new observations (on splitting. The most common tree methods include chi-
specific variables) from other observations (on the same or squared automatic interaction detection (CHAID), classi-
other variables) after executing a process of so-called fication and regression trees (CART), C4.5 [12] and C5.0.
learning from existing data. Fig. 2 shows the flow of a In C4.5, the target is nominal and the inputs may be
neural network. nominal or interval. The recommended splitting criterion is
The feedforward network is the simplest and most the gain ratio ¼ reduction in entropy/entropy of split. C5.0
popular type of network. Training a neural network is the is an improved version of C4.5 with the following
ARTICLE IN PRESS
1764 G.K.F. Tso, K.K.W. Yau / Energy 32 (2007) 1761–1768

differences: (1) the branch-merging option for nominal 4.1.1. Summer phase data
splits is the default; (2) misclassification costs can be The stepwise regression model results are given in Table
specified; (3) boosting and cross-validation are available; 1 (number of observations ¼ 1116; number of significant
and (4) the algorithm for creating rule sets from trees is variables entered in the model ¼ 6). In an ANOVA table,
much improved. the F-statistic ¼ 68.90 (p-valueo0.0001), indicating the
A major advantage of the decision tree over other results of the regression model are satisfactory. The flat
modeling techniques is that it produces a model which may size, number of family members and the ownership of air-
represent interpretable rules or logic statements. The conditioner, clothes dryer, rangehood and ventilation fan
explanation capability that exists for trees producing axis are important factors influencing the electricity consump-
parallel decision surfaces is an important feature [11]. tion. Notice that all the first 3 factors relate to the
Besides, classification can be performed without compli- electricity consumption due to air-conditioning. The sub-
cated computations and the technique can be used for both tropical climate in Hong Kong generally contributes to this
continuous and categorical variables. Furthermore, deci- phenomenon. The average temperature and humidity
sion tree model results provide clear information on the during summer in Hong Kong are both high and air-
importance of significant factors for prediction or classi- conditioning end-use represents 59% of the total electricity
fication. consumption in a typical summer month [2].
However, decision tree induction generally does not
perform as well as neural networks for nonlinear data, and
it is susceptible to noisy data [10]. In general, the technique 4.1.2. Winter phase data
is more suitable for predicting categorical outcomes and, The stepwise regression model results are given in
unless visible trends and sequential patterns are available, Table 2 (number of observations ¼ 1000; number of
decision trees are less appropriate for application to time significant variables entered in the model ¼ 5). In an
series data. ANOVA table, the F-statistic ¼ 66.21 (p-valueo0.0001),
In our energy consumption model, an F-test is applied indicating the results of the regression model are satisfac-
for tree classification. The F-test criterion with a signifi- tory. The number of family members, private development,
cance level of 0.2, a default in SAS Enterprise Miner [13], is village house, ownership of electric water heater and
selected. In order to prevent the tree from growing too rangehood are found to be important factors influencing
large, the number of observations required for a split electricity consumption. While many households do not
search is set at 50. In addition, missing values are not operate their air-conditioner during winter, water-heating
counted as an acceptable value in our decision tree models. end-use takes up a major proportion of total electricity
consumption. In Hong Kong, the type of housing
3.4. Model selection criteria determines whether electric or gas water heaters are used.

The square root of average squared error (RASE) is used Table 1


as a performance measure in our model comparison. The Regression model for electricity energy consumption in summer (stepwise)
RASE is defined as Coefficient (standard error) t-statistic
pffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
RASE ¼ ASE ¼ SSE=n, Housing type
where SSE is the sum of squared error, n the number of HOS NS NS
PD NS NS
observations and ASE the average squared error. VH NS NS

4. Results Household characteristics


AGE NS NS
SIZE 0.030 (0.005) 5.994
In order to find the factors influencing electricity energy RENT NS NS
consumption, multiple regression, neural network and INCOME NS NS
decision tree models are built for the energy consumption MEMBER 12.495 (0.915) 13.661
data obtained in summer and winter phases of the survey. Appliance ownership
The target variable is the total weekly electricity energy AC 42.648 (6.699) 6.366
consumption (in kWh). Housing type, household FAN NS NS
characteristics and appliance ownership are considered CD 15.421 (3.718) 4.148
CW NS NS
as potential factors influencing the electricity energy DEH NS NS
consumption. EWH NS NS
EK NS NS
4.1. Multiple regression analysis RH 9.857 (2.507) 3.932
VFAN 7.794 (2.881) 2.705
The stepwise selection method is applied in the regres- NS: not significant in the stepwise multiple regression model.
sion model. The entry and stay points are set at 0.05.  Significant at the 5% level.
ARTICLE IN PRESS
G.K.F. Tso, K.K.W. Yau / Energy 32 (2007) 1761–1768 1765

Table 2 Table 3
Regression model for electricity energy consumption in winter (stepwise) Neutral network model for electricity energy consumption in summer

Coefficient (standard error) t-statistic Direct effect Total effect


H11
Housing type
HOS NS NS HOS 0.030 3.490
PD 17.635 (3.081) 5.723 PD 0.023 2.650
VH 13.891 (6.720) 2.067 VH 0.029 3.338
AGE 0.014 1.623
Household characteristics SIZE 0.062 7.284
AGE NS NS RENT 0.006 0.689
SIZE NS NS INCOME 0.025 2.930
RENT NS NS
MEMBER 0.151 17.649
INCOME NS NS AC 0.293 34.155
MEMBER 9.162 (1.150) 7.964 FAN 0.030 3.467
Appliance ownership CD 0.065 7.564
AC NS NS CW 0.072 8.404
FAN NS NS DEH 0.018 2.136
CD NS NS EWH 0.006 0.724
CW NS NS EK 0.007 0.805
DEH NS NS RH 0.052 6.012
EWH 47.094 (3.228) 14.589 VFAN 0.041 4.739
EK NS NS
RH 9.073 (3.077) 2.949 Prediction of electricity energy consumption can be achieved as follows:
VFAN NS NS H11 is the hidden layer for the neural network model.
H11 ¼ 0.567–0.03HOS–0.023PD+0.029VH–0.014AGE–0.062SIZE+
NS: not significant in the stepwise multiple regression model. 0.006RENT0.025INCOME–0.151MEMBER+0.293AC–0.03FAN+
 Significant at the 5% level. 0.065CD+0.072CW+0.018DEH–0.006EWH–0.007EK+0.052RH+
0.041VFAN.
H110 ¼ 1/(1+EXP(H11))[sigmoid activation function].
Households living in public rental housing mainly use Then, based on the prediction equation obtained from the neural network
electric water heaters, whereas households living in private model, predicted electricity energy consumption ¼ 106.4–116.72H110 .
developments and village houses mainly use gas water
heaters. significantly. The direct and total effects of each of the
variables and the equation for predicting electricity energy
4.2. Neural network analysis consumption are given in Table 4.
When compared with the result found by the regression
4.2.1. Summer phase data model, the government subsidized home type is an
Applying the neural network analysis to the summer important factor and replaces private development and
phase data, it is found that the number of family members, village house types. However, these are all dummy
flat size, and the ownership of air-conditioner, clothes variables for differentiating participants’ housing types.
dryer, clothes washing machine and rangehood are
important factors underlying electricity consumption. 4.3. Decision tree analysis
One neuron is used in the hidden layer. Increasing the
number of neurons does not improve the fit of the model 4.3.1. Summer phase data
significantly. The direct and total effects of each of the According to the decision tree analysis, the average
variables and the equation for predicting electricity energy weekly household electricity consumption during the
consumption are given in Table 3. Most of these important summer is 76.92 kWh. The highest electricity consumption
factors coincide with the significant factors found by the group households (mean ¼ 133.33 kWh) have 4 or more
regression model, except only that the effect of clothes family members and live in a flat that is larger than 817 ft2.
washing machine ownership replaces the effect of ventila- The lowest electricity consumption group households
tion fan ownership. (mean ¼ 11.69 kWh) have less than 4 family members
and live without air-conditioners. A detailed decision tree
4.2.2. Winter phase data classification is given in Fig. 3.
Applying the neural network analysis to the winter phase
data, it is found that the number of family members, living 4.3.2. Winter phase data
in a government subsidized home, and ownership of In winter, the average weekly household electricity
electric water heater, rangehood and ventilation fan are consumption is 41.06 kWh. The highest electricity con-
important factors influencing electricity consumption. One sumption group consumes 168.86 kWh on average. The
neuron is used in the hidden layer. Increasing the number households in this group have 5 or more family members,
of neurons does not improve the fit of the model own an electric water heater and have a monthly family
ARTICLE IN PRESS
1766 G.K.F. Tso, K.K.W. Yau / Energy 32 (2007) 1761–1768

Table 4 income exceeding HK$40,000 (US$5000). The lowest


Neutral network model for electricity energy consumption in winter electricity consumption group consumes 17.14 kWh on
Direct effect Total effect
average. This group does not have an electric water heater,
H11 has 5 family members or less and lives in a flat size that is
smaller than 451.5 ft2. A detailed decision tree classification
HOS 0.166 118.116 is given in Fig. 4.
PD 0.027 19.473
VH 0.028 19.970
AGE 0.019 13.574
SIZE 0.011 7.746 4.4. Model selection
RENT 0.017 11.939
INCOME 0.013 9.239 The RASE values of the various models are given in
MEMBER 0.164 116.694 Table 5. For reference, the RASE values of the full
AC 0.006 3.980
regression model and the null regression model are also
FAN 0.030 21.534
CD 0.019 13.645 provided in Table 5. In the summer phase, the decision tree
CW 0.004 2.772 model reveals a few important factors, which essentially
DEH 0.036 25.656 relate to energy consumption due to the air-conditioning
EWH 0.308 218.749 factor. When the dominating effect of air-conditioning
EK 0.007 4.904
RH 0.071 50.672
disappears in the winter phase, other important factors are
VFAN 0.069 48.753 identified by the three models. In short, all three types of
model pick up the most important factors in each season.
Prediction of electricity energy consumption can be achieved as follows: In the summer phase, these are the ownership of air-
H11 is the hidden layer for the neural network model.
conditioner and related factors affecting the energy
H11 ¼ 1.99–0.166HOS+0.027PD+0.028VH–0.019AGE–0.011SIZE+
0.017RENT 0.013INCOME–0.164MEMBER+0.006AC0.03FAN+ consumption level. In the winter phase, these are the
0.019CD+0.004CW+0.036DEH+0.308EWH+0.007EK+0.071RH+ housing type, number of members in the household, and
0.069VFAN. the ownership of electric water heater and rangehood.
H110 ¼ 1/(1+EXP(-H11))[sigmoid activation function]. According to the RASE values, all the three proposed
Then, based on the prediction equation obtained from the neural network
models, namely stepwise regression, decision tree and
model, predicted electricity energy consumption ¼ 731.62–724.53H110 .
neural networks are comparable. In the summer phase,
the decision tree model performs slightly better than the
other two methods; while in the winter phase, neural

N 1166
Average 76.92

Member

<3.5 >=3.5

N 536 N 630
Average 60.49 Average 90.91

AC own Flat size

1 0 <817 >=817

N 510 N 26 N 569 N 61
Average 62.97 Average 11.69 Average 86.36 Average 133.33

Member AC own

<2.5 >=2.5 1 0

N 251 N 259 N 555 N 14


Average 53.47 Average 72.19 Average 87.87 Average 26.29

Fig. 3. Decision tree model for electricity energy consumption (in kWh) in summer.
ARTICLE IN PRESS
G.K.F. Tso, K.K.W. Yau / Energy 32 (2007) 1761–1768 1767

N 1000
Average 41.06

EWHown

1 0

N 314 N 686
Average 72.71 Average 26.57

Member Flatsize

<4.5 >=4.5 <451.5 >=451.5

N 248 N 66 N 269 N 417


Average 63.04 Average 109.05 Average 18.49 Average 31.79

RHown Income Member HSE type

1 0 <40 >=40 <5.5 >=5.5 VH & HOS PD & PR

N 123 N 125 N 52 N 14 N 257 N 12 N 119 N 298


Average 74.72 Average 51.54 Average 92.94 Average168.86 Average 17.14 Average 47.42 Average 20.18 Average 36.42

Fig. 4. Decision tree model for electricity energy consumption (in kWh) in winter.

Table 5
RASE of decision tree, neural network and regression models

RASE (kWh)

Decision tree Neural network Regression (stepwise) Regression (full) Regression (intercept)

Summer 39.363 39.527 39.424 39.627 46.300


Winter 44.397 44.142 45.184 44.973 52.096

Table 6
Significant factors influencing energy consumption

Summer Winter

Decision tree Neural network Regression (stepwise) Decision tree Neural network Regression (stepwise)

HOS *
PD * *
VH * *
AGE
SIZE * * * *
RENT
INCOME *
MEMBER * * * * * *
AC * * *
FAN
CD * *
CW *
DEH
EWH * * *
EK
RH * * * * *
VFAN * *
ARTICLE IN PRESS
1768 G.K.F. Tso, K.K.W. Yau / Energy 32 (2007) 1761–1768

network model performs slightly better than the other two of conducting data analysis. Data mining is now receiving
alternatives. For comparative purposes, a summary of the plenty of attention and is being recognized as a newly
significant factors identified in these models is given in emerging analysis tool. When searching for a predictive
Table 6. In this empirical application to an electricity model, common practice in data mining is to develop
energy consumption study, the decision tree and neural various models using different approaches, then select a
network models appear to be viable alternatives to the final model after comparing their accuracies according to
stepwise regression model in understanding energy con- some model selection criteria. This study illustrates how
sumption patterns and in predicting energy consumption this concept can be used to predict electricity energy
levels. consumption in Hong Kong. When comparing accuracy in
predicting electricity energy consumption, it is found that
5. Discussion the decision tree model and neural network model perform
slightly better than other models in the summer and winter
In summer phase, the decision tree model resulted in phases, respectively. In general, the differences in RASE
fewer numbers of significant factors influencing energy between the three types of model are quite small, indicating
consumption than neural network and stepwise regression. the three modeling techniques are generally comparable in
In other words, the decision tree model, with its simpler predicting energy consumption.
structure, is more accurate than other models. Three
factors, namely flat size, number of members in the Acknowledgments
household, and ownership of air-conditioner are found to
be significant in all three models using the decision tree, The authors would like to thank the referees for helpful
neural network, and stepwise regression. This phenomenon comments on an earlier version of the paper.
is reasonable because air-conditioning consumes on aver-
age 59% of the electricity in a typical household in Hong References
Kong during the summer [2], and the number of air-
conditioners is associated with flat size and number of [1] Lam JC. Climatic and economic influences on residential electricity
members in the household. In the winter phase, the housing consumption. Energy Convers Manage 1998;39:623–9.
[2] Tso GKF, Yau KKW. A study of domestic energy usage pattern in
type plays a role in the electricity energy consumption level.
Hong Kong. Energy 2003;28:1671–82.
Two other factors, ownership of electric water heater and [3] Al-Garni AZ, Zubair SM, Nizami JS. A regression model for electric
rangehood are found to be significant in influencing energy consumption forecasting in Eastern Saudi Arabia. Energy
electricity consumption in all three models. The number 1994;19:1043–9.
of household members is significant in all models, but flat [4] Yan YY. Climate and residential electricity consumption in Hong
size and ownership of air-conditioner are not significant Kong. Energy 1998;23:17–20.
[5] Ranjan M, Jain VK. Modelling of electrical energy consumption in
factors in the winter phase. As a remark, inclusion of Delhi. Energy 1999;24:351–61.
meteorological variables such as temperature and wind [6] Egelioglu F, Mohamada AA, Guven H. Economic variables and
velocity should improve the model fitting results. However, electricity consumption in Northern Cyprus. Energy 2001;26:355–62.
such information cannot be matched accurately with the [7] Kalogirou SA, Bojic M. Artificial neural networks for the prediction
existing database retrospectively. of the energy consumption of a passive solar building. Energy
2000;25:479–91.
Predicting energy consumption plays an important role [8] Müller W, Wiederhold E. Applying decision tree methodology for
in decision making and planning for utility companies. In rules extraction under cognitive constraints. Eur J Oper Res
the past, regression models were mainly adopted to predict 2002;136:282–9.
energy consumption [2–6]. The use of alternative analytical [9] Berry MJA, Linoff G. Data mining techniques for marketing, sales,
and customer support. New York: Wiley; 1997.
methods has not been popular in the energy consumption
[10] Curram SP, Mingers J. Neural networks, decision tree induction and
literature. While the regression analysis method is sup- discriminant analysis: an empirical comparison. J Oper Res Soc 1994;
ported by statistical theories as producing good estimates 45:440–50.
according to certain statistical properties, for instance, [11] Perner P, Zscherpel U, Jacobsen C. A comparison between neural
being the best linear unbiased estimator, other approaches networks and decision trees based on data from industrial radio-
such as decision tree and neural network are found useful graphic testing. Pattern Recognition Lett 2001;22:47–54.
[12] Quinlan JR. C4.5 programs for machine learning. San Mateo:
in developing predictive models in other fields. In the past Morgan Kaufmann; 1993.
decade, advancements in database management and [13] SAS Institute Inc. S/STAT user’s guide. Cary: SAS Institute
improvements in computing speed have lead to new ways Inc.; 2003.

You might also like