Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Energy 36 (2011) 6981e6992

Contents lists available at ScienceDirect

Energy
journal homepage: www.elsevier.com/locate/energy

A hybrid fuzzy mathematical programming-design of experiment framework for improvement of energy consumption estimation with small data sets and uncertainty: The cases of USA, Canada, Singapore, Pakistan and Iran
A. Azadeh a, b, *, M. Saberi c, d, S.M. Asadzadeh a, b, M. Khakestani a, b
a

Department of Industrial Engineering, Center of Excellence for Intelligent Based Experimental Mechanics, College of Engineering, University of Tehran, P.O. Box 11365-4563, Iran Department of Engineering Optimization Research, College of Engineering, University of Tehran, P.O. Box 11365-4563, Iran Department of Industrial Engineering, University of Tafresh, Iran d Institute for Digital Ecosystems & Business Intelligence, Curtin University of Technology, Perth, Australia
b c

a r t i c l e i n f o
Article history: Received 6 January 2011 Received in revised form 6 June 2011 Accepted 10 July 2011 Available online 5 November 2011 Keywords: Hybrid framework Fuzzy regression Small data sets Uncertainty Energy consumption Design of experiment

a b s t r a c t
Utilization of small data sets for energy consumption forecasting is a major problem because it could create large noise. This study presents a hybrid framework for improvement of energy consumption estimation with small data sets. The framework is based on fuzzy regression, conventional regression and design of experiment (DOE). The hybrid framework uses analysis of variance (ANOVA) and minimum absolute percentage error (MAPE) to select between fuzzy and conventional regressions. The signicance of the proposed framework is three fold. First, it is exible and identies the best model based on the results of ANOVA and MAPE. Second, the framework may identify conventional regression as the best model for future energy consumption forecasting because of its dynamic structure, whereas in the case of uncertainty and ambiguity, previous studies assume that fuzzy regression provides better solutions and estimation. Third, it is ideal candidate for short data sets. To show the applicability of the hybrid framework, the data for energy consumption in Canada, United States, Singapore, Pakistan and Iran from 1995 to 2005 are considered and tested. This is the rst study which introduces a hybrid fuzzy regression-design of experiment for improvement of energy consumption estimation and forecasting with relatively small data sets. 2011 Elsevier Ltd. All rights reserved.

1. Introduction Data collection is a major issue in several countries including developing countries such as Iran and Pakistan due to lack of data, shortage of data, missing values and lack of a robust and standard data collection system. Furthermore, small data sets are not only the subsequent of availability to the data but more the ability of reliance on the available data. For example, Iran has experienced an 8-year long War against Iraq and the economic and societal conditions today are extremely different with the ones during war period. Consequently, one cannot rely on the war-period data to construct forecasting models which relate energy consumption to economic and societal variables e.g. GDP and population. It means that the access to reliable data is limited here and it makes the

* Corresponding author. Department of Industrial Engineering, Center of Excellence for Intelligent Based Experimental Mechanics, College of Engineering, University of Tehran, P.O. Box 11365-4563, Tehran 14178-43111, Iran. Tel.: 98 21 82084164; fax: 98 21 8208 4162. E-mail addresses: ali@azadeh.com, aazadeh@ut.ac.ir (A. Azadeh). 0360-5442/$ e see front matter 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.energy.2011.07.016

available data set small. Economic recession, energy crisis and political changes also are other important examples that cast doubt on the justication of using all available data for forecasting purposes. Hence, available data are limited to a small set. In several forecasting studies, the focus is on the development of more sophisticated models for energy forecasting and availability of data is taken as granted. The author in [1] has used 180 rows of data in electricity and oil consumption to show the results of his combined linear-ANN model for forecasting in variance-instable environment. Zhang et al. [2] forecasted china transport energy demand for 2010, 2015 and 2020 based on partial least-square regression with the data from 1990 to 2006. The real annual values of Turkey hydroelectric generation for 1970e2006 are used to predict it in the years 2007e2012 [3]. Singular spectral analysis method is proposed for short-term load forecasting [4]. It uses 844 daily observations of electricity consumption to estimate its parameters. However, in some real world cases, the available and reliable data set is small in size. More formally, the small data set can be dened as follows: small data set refers to a set of available and reliable data set which is not large enough to be used for training/

6982

A. Azadeh et al. / Energy 36 (2011) 6981e6992

estimation of articial intelligence/econometric models. In other word, with regard to the number of unknown parameters in the model, the number of available and reliable data is small in proportion to the data set which would let a robust and unique estimation of model parameters. Indeed, here the focus is on datadriven forecasting approaches such as ANN, neuro-fuzzy, and linear or nonlinear conventional regressions. Regression analysis is one of the most used statistical tools to explain the variation of a dependent variable Y in terms of the variation of explanatory variables X as: Y f(X) where f(X) for regression analysis is not required to be linear. It refers to a set of methods by which estimates are made for the model parameters from the knowledge of the values of a given inputeoutput data set. The goal of the regression analysis is: (a) To nd an appropriate mathematical model, and (b) To determine the best tting coefcients of the model from the given data The use of classical regression is bounded by some strict assumptions about the given data. This model can be applied only if the given data are distributed according to a statistical model and the relation between X and Y is crisp. Overcoming such limitations, fuzzy regression is introduced which is an extension of the classical regression and is used in estimating the relationships between variables where the available data are very limited and imprecise and variables are interacting in an uncertain, qualitative, and fuzzy way [5]. Fundamental differences between fuzzy regression and classical regression are as follows: Fuzzy regression can be used to t fuzzy data and crisp data into a regression model, whereas classical regression can only t crisp data. Classical regression analysis is based on some assumptions. As one of the assumptions, the unobserved error terms should mutually be independent and identically distributed. Lack of such assumption affects the effectiveness of the analysis. In this case, fuzzy regression can be replaced. In contrast to the classical regression that is based on probability theory, fuzzy regression is based on possibility theory and fuzzy set theory. In classical regression, the untted errors between a regression model and observed data are assumed as observation error that is a random variable. In fuzzy regression, the same untted errors are viewed as the fuzziness of the model structure [6,7]. The goal of fuzzy regression analysis is to nd a regression model that ts all observed fuzzy data within a specied tting criterion. Different fuzzy regression models are obtained depending on the tting criterion used. Fuzzy regression is based on minimizing the total squares errors of the spread value as the tting criterion. Moreover, with fuzzy regression, a mathematical programming approach is developed such that the predictability can be improved and the computation complexity can be decreased. In general, there are two approaches of fuzzy regression due to different tting criterions [8e10]. The rst approach is based on minimizing fuzziness as an optimal criterion which rst proposed by Tanaka [8]. Different researchers used Tanakas approach to minimize the total spread of the output [9,11,12]. As pointed out by Wang and Tsuar [13], the advantage of this approach is its simplicity in programming and computation, but it has been criticized to provide too wide ranges in estimation which could not give much help in application [5] and not to utilize the concept of least squares [14]. The second approach uses least squares of errors as a tting criterion to minimize the total square error of the output. Different aspects of this approach were investigated by Celmins [15,16], Diamond [17], Savic and Pedrycz [18] and Chang and Ayyub

[19]. Celmins [15] denes a compatibility measure between fuzzy data and a model and uses this measure as a model-tting criterion. Diamond [17] developed a fuzzy least-square method. Savic and Pedrycz [18] proposed a combined approach for fuzzy least-square regression analysis (FLSRA) by integrating minimum fuzziness criterion into the ordinary least-squares regression. Chang and Ayyub [19] discussed reliability issues of Fuzzy least-square regression analysis (FLSRA), such as standard error and correlation coefcient. This approach, though providing narrower range, costs too much of computation time [13]. Therefore, a natural extension of fuzzy regression would be the integration of the least squares concept into fuzzy regression. Approaches used for energy forecasting may be divided into two main categories, i.e. econometric approach and articial intelligence approach. Classical and fuzzy regressions are categorized under econometric approaches. Articial neural network (ANN) as an intelligence approach has found considerable attention in energy forecasting (see for example [20e22]). Although ANN does not need a predetermined production function between outputs and inputs however, the main disadvantage of ANN is that it usually requires a large set of data for training and validation and it is hard to apply such model with small data sets. Authors in [23] compared the results of fuzzy regression and Articial Neural Network (ANN) in predicting monthly electricity consumption of Iran and showed that fuzzy regression forecast the monthly demand with lower MAPE. Fuzzy regression models have been successfully applied to various engineering problems such as Ergonomics [24] and Quality Control [25]. The application of this method in the context of energy demand forecasting is well treated in the literature. In [12], in an uncertain environment, a study of the computer and peripheral equipment sales in the United States was discussed using fuzzy linear regression introduced by Tanaka et al. [7]. The well-known fuzzy rule-based TakagieSugenoeKang (TSK) model combined with a set of fuzzy regressions was proposed to investigate the impact of the climate change on the short-term electricity consumption duration in Iran [26]. The paper introduces a type III TSK fuzzy inference machine combined with a set of linear and nonlinear fuzzy repressors in the consequent part to model effects of the climate change on the electricity demand. An integrated fuzzy system, data mining and time series framework to estimate and predict electricity demand for seasonal and monthly changes were presented in [27]. The authors argued that the application of their framework is appropriate and capable of handling situations with non-stationary data and used their framework to forecast the electricity consumption in Iran and China. In [28] a new approach was introduced to nd the parameters of a linear fuzzy regression, with fuzzy outputs and crisp input data. The proposed method was used to forecast the annual enduse energy in the ResidentialeCommercial sector of Iran. Although previous studies have provided satisfactory results in forecasting energy demand, however none of them have dealt with small data sets. Moreover, intelligent approach such as ANN or neuro-fuzzy requires large data sets to be trained and tested. Also, simple models could not treat the problems associated with small data sets. To the best knowledge of the authors, this is the rst study that introduces a exible approach for energy consumption estimation and forecasting with small data sets. Small data sets are often the case in developing countries. Furthermore, small data sets are usually associated with fuzziness and uncertainty. On the other hand, there is a small chance that they may be associated with certainty and crispness. This is why a hybrid and exible framework composed of both fuzzy and classical regressions together with design of experiment seems to be ideal in such situations.

A. Azadeh et al. / Energy 36 (2011) 6981e6992

6983

1.1. Manufacturing technology and energy consumption Exploring the literature in the eld of energy consumption shows that in several countries there is a positive relationship between gross domestic production (GDP) and energy consumption. Improving some of the main features of manufacturing technology is directly related to energy consumption. Also, the limitations of energy resources and strictly increasing energy consumption trend show the need to design accurate devices for consumption of energy in manufacturing sector in particular and industrial sector in general. Hence, there is a need to focus on trend of energy consumption in the future, particularly in manufacturing sector. For example, if the future energy consumption is accurately estimated and forecasted with high and acceptable condence, then, the needed power or natural gas can be estimated and the facilities needed for fulllment of these demands can be designed and constructed in timely manner for higher efciency and productivity. The energy consumption in some countries such as USA and Canada has more heterogeneity than Pakistan for example. Moreover, as shown in Fig. 1, energy consumption in these countries has uncertain structure and usually a short history of steady past trend is in hand to forecast the future. This makes econometric models (fuzzy or classical regressions) the ideal candidate for forecasting purposes. Please note that Fig. 1 represents annual energy consumption in case studies so the heterogeneity in energy consumption gures is discussed based on annual variation in energy consumption not daily/weekly/monthly variations. Moreover, the trend for each country is based on annual data from 1995 to 2005.

Industrial sector and as the main part of it, manufacturing sector have a considerable participation in total energy use in almost every industrialized country around the world. For example, the industrial sector in the United States has the greatest share among the other sectors in total energy use (Fig. 2). Overall, accurate forecasting of energy consumption in manufacturing sector would be quite benecial with respect to strictly increasing technological and industrial growth in 21st century. The remainder of the paper comprises the followings: Section 2 introduces the hybrid fuzzy regression DOE framework for total energy consumption estimation with small data sets. Methodology of the hybrid framework is also presented. Section 3 presents an experiment showing the applicability of the proposed framework for the cases of the USA, Canada, Singapore, Pakistan and Iran. Results and discussions about the experiment are located in Section 4. Finally, Section 5 presents proper conclusions of the study. 2. Method: the hybrid framework The economic indicators used in this paper are population and Gross Domestic Production in the last periods. The proposed framework uses ANOVA to select either fuzzy regression or classical regression model for future energy estimation. Fuzzy regression and classical regression are tested through a randomized complete block design at a 5% signicance level. The null hypothesis of this design is that the forecasts of fuzzy regression, the forecasts of classical regression, and actual energy consumption as treatment are not statistically different. If the null hypothesis is accepted then the preferred model is the one which has lower MAPE. Otherwise, if

Fig. 1. Energy consumption in the selected countries (1015 Btu).

6984

A. Azadeh et al. / Energy 36 (2011) 6981e6992

regression provides better solutions and estimation. Third, it is ideal candidate for short data sets. Fig. 3 depicts the hybrid framework of this study. The reader should note that all steps of the hybrid framework are based on standard and scientic methodologies which are Fuzzy regression, classical regression, analysis of variance (ANOVA), Tukey Simultaneous Tests and MAPE. Furthermore, the Fuzzy regression modeling is based on which regression model is selected for the data set. The best model is distinguished by modeling, running and testing various regression models and selecting the model with lowest error.
Fig. 2. Total energy consumption by end-use sectors in the USA. (Source: http://www. eia.doe.gov/emeu/aer/ep/ep_frame.html.)

2.1. Model variables The following variables are dened to estimate annual energy consumption:
-

the null hypothesis is rejected, Tukey Simultaneous Tests is used to compare treatment means and to select the preferred model. The fuzzy regression models are solved by the Lingo software. Flexibility in using ANOVA or MAPE for selection between regression models is the rst unique feature of this study. In addition, this study does not presume a preferred model and select it dynamically with accordance to the data in the case. The significance of the proposed framework is three fold. First, it is exible and identies the best model based on the results of ANOVA and MAPE. Second, the proposed framework may identify classical regression as the best model for future energy consumption forecasting because of its dynamic structure, whereas in the case of uncertainty and ambiguity, previous studies assume that fuzzy

Y: Energy consumption in each country X1: Annual population of each country X2: Gross Domestic Production (GDP)

2.2. Error estimation methods There are four basic error estimation methods which are listed below:
-

Mean Absolute Error (MAE) Mean Square Error (MSE)

Collection of input
variables: X1 and X2 and output variable: Y for the targeted short period Fitting the best regression model to data

Development of fuzzy regression model based on the best regression mode l

Forecasting energy consumption by best fitted fuzzy regression and class ica l r eg ress ion m ode ls

Perform ANOVA F-test for fuzzy regression, classical regression and actual data (test data)

Perform Tukey Simultaneous Tests for identification of either the preferred model

No

Is the null hypothesis of

Yes

Selection of fuzzy regression or classical regression based on the lowest value of MAPE

Fig. 3. The hybrid fuzzy regression DOE framework for total energy consumption estimation with small data sets.

A. Azadeh et al. / Energy 36 (2011) 6981e6992


-

6985

Root Mean Square Error (RMSE) Mean Absolute Percentage Error (MAPE)

They can be calculated by the following equations:


n P

denominator and the probability that F variable is greater than Fa,(a1),(a1)(b1) is a. The formulas for computing the sum of squares (SS) in Table 1 are presented in Equations (3)e(6).

MAE

t1 n P

jx t x 0 tj n

SSTotal

a X b X i1 j1

y2 ij

y2 $$ ab

(3)

MSE

t1

xt x0 2 (1)

RMSE

n v uP n u x x0 2 u tt 1 t n   n x x 0  P t    x 
t

SSTreatments

a 1X y2 y2 $$ i $ b i1 ab

(4)

SSBlocks

b 1X y2 y2 $$ a j 1 $j ab

(5)

MAPE

t1

SSError SSTotal SSTreatments SSBlocks

(6)

All methods, except MAPE have scaled output. As input data used for the model estimation, preprocessed and raw data, have different scales. MAPE method is the most suitable one to estimate the errors.

where yij is the observation in ith treatment and jth Block. Dot (.) means sum over the index as y$$ is the sum of all yijs, yi$ is the sum of yijs in ith treatment, and y$j is the sum of yijs in jth Block.

2.4. Fuzzy regression models 2.3. Design of experiment To examine the differences of the estimated results of fuzzy regression and classical regression when compared with actual data, the examiner needs to rst determine all sources of variability on the response (here Energy Consumption) and second design an experiment to study the signicance of the variability sources. The experiment should be designed such that variability arising from extraneous sources can be systematically controlled. Time is the common source of variability in the experiment that can be systematically controlled through blocking. The experiment designed in this study is a randomized complete block design (RCBD). Moreover, the blocks provide sufcient replications in the experiment. In this case, the interactions between treatments and blocks are treated as the random error component. The hypothesis is: There are two main approaches in fuzzy regression model development e fuzzy linear regression (FLR) and fuzzy leastsquares regression (FlSR) [30,31]. Fuzzy linear regression was rst introduced by Tanaka et al. in 1982 [7] and its variations suggested by Tanaka [32], Sakawa and Yano [10,33], Peters [9], Kim and Bishu [34]. Sakawa and Yano also introduced fuzzy data in the formulation. They considered the possibility and necessity conditions for fuzzy equality as dened by Dubois and Prade. The fuzzy leastsquares regression (FLSR) was rstly introduced by Diamond [17] and Celmins [15,16]. The developed models of this approach are similar to Savic and Pedrycz [18], Bardossy and Duckstein [35], Chang and Lee [11] and Tanaka and Lee [36]. The basic Tanaka model assumes a fuzzy linear function as shown in Model (7):

~ X A ~ X ; .; A ~ X AX ~ ~ A Y N N 0 0 1 1

(7)

H0 : H1 :

m1 m2 m3 mi smj ; i; j 1; 2; 3; isj

(2)

where m1, m2 and m3 are the average estimation values obtained from actual data. A detailed description of the procedure of ANOVA for RCBD, mainly adopted from [29], will be given here. The procedure is usually summarized in an ANOVA table (Table 1). In RCBD of Table 1, there are a treatments, b blocks and ab observations in total because there are no replications in the design. Fa,(a1),(a1)(b1) is a value from F-distribution with (a 1) degrees of freedom in nominator and (a 1)(b 1) degrees of freedom in
Table 1 Analysis of variance for a randomized complete block design. Source of variation Treatments Blocks Error Total Sum of squares SSTreatments SSBlocks SSError SSTotal Degrees of freedom a1 b1 (a 1)(b 1) ab 1

where X [X0, X1. XN]T is a vector of independent variables, ~ ; .; A ~ T is a vector of fuzzy coefcients presented in the ~ A ~ ;A A N 0 1 form of symmetric triangular fuzzy numbers denoted by ~ a ; c where aj is its central value and cj is the spread value. A j j Thus, Model (7) can be rewritten as Model (8):

~ a ; c a ; c X / aN ; cN XN Y 0 0 1 1 1 l

(8)

The above fuzzy regression analysis assumes the crisp input and output data, while the relation between the input and output data is dened by a fuzzy function. By applying the Extension Principle, it derives the membership function of estimated value. Each value of the dependant variable can be estimated as a fuzzy number

Mean squares SSTreatments a1 SSBlocks b1 SSError a 1b 1

F-value MSTreatments MSError

P-value P F value > Fa;a1;a1b1

6986 Table 2 Raw data for the United States. Year Consumption (1015 Btu) Population (Millions) GDP 1995 91.17 266.56 2.5 1996 94.18 269.67 3.7 1997

A. Azadeh et al. / Energy 36 (2011) 6981e6992

1998 95.18 276.12 4.2

1999 96.82 279.29 4.4

2000 98.98 282.34 3.7

2001 96.33 285.02 0.8

2002 97.86 287.68 1.6

2003 98.21 290.34 2.5

2004 100.35 293.03 3.6

2005 100.69 295.73 3.1

94.77 272.91 4.5

~ Y L ; Y h1 ; Y U , i 1, 2. M, where the lower bound, central Y i i i i value and the upper bound are shown in Model (9):

Yil

N P j0

aj cj xij aj x
(9)

Yih1 Yiu

N P j0

N P j0

aj cj xij

the outliers [9] and prediction intervals become wider as more data are collected [39,41]. On the other hand, fuzzy least-squares regression (FLSR) has had very few criticisms because of its similarity to traditional least-squares regression. However, FLSR is sensitive to outliers and it should be used only when enough data are available which results in losing one of the advantages of the fuzzy regression. zelkan [31], zelkan et al. [42] and zelkan and Duckstein [43] developed a bi-objective fuzzy regression (BOFR) model which is capable of solving the problems of fuzzy linear regression (FLR), mentioned above, especially the problem of data outliers as shown by Model (12).

Thus, the proposed model of Tanaka becomes Model (10) as follows:

Min dominated V ; Ep Subject to : !     cj xij yi 1 hei


N P j0

Z Min
N P j0 N P j0

N P j0

cj
N P j0 N P j0

N P

aj xij 1 h aj xij 1 h

  cj xij  ! yi 1 hei ci 1; 2; .; M   cj xij  h yi 1 hei ci 1; 2; .; M 1

j0

aj xij 1 h

N P j0 N P j0

L;i

(12)

(10) yi 1 hei

aj xij 1 h

  cj xij 

! R ; i

cj ! 0; aR; xi0 1; 0

L;i ; R;i ! 0 ci 1; 2; .; M
where V denotes the vagueness measure dened as the spread of the prediction to be minimized and Ep is the deviation from outliers which are brought in Model (13) and l,i, R,i are relaxation variables.03p3N is the compensation level. The expression min-dominated indicates the non-inferior solution-nding process.
M P i 1 N  P i 1 N P j 0

where the term h is referred to as a measure of goodness of t or a measure of compatibility between data and a regression model and yi is the center and ei is the spread of the ith collected data. Fuzzy linear regression (FLR) has been criticized, especially in the original formulation of Tanaka et al. [7]. As Jozsef [37] pointed out the solution of Tanakas model is xj-scale dependant and many cjs may be zero. To rectify this problem, Tanaka et al. [38] and Redden and Woodall [39] proposed the following objective function as shown by Model (11):

aj xij 1h


Z Min yj !
n P i1

N P j0

cj

M P i1

! jxij j
n P i1

j 0

! N   P cj xij 

N P j 0

aj xij 1h

j0

N   P cj xij 

!!

(11) ci xij

Ep

p p L;i R;i

Pi xij 1 h

(13)
Moreover, zelkan [31] and zelkan and Duckstein [43] proved that fuzzy linear regression (FLR) models of Tanaka et al. [7], Tanaka [32], Peters [9] and classic crisp (non-fuzzy) regression model are specic cases of their model. Although having several advantages compared with other fuzzy linear regression (FLR) models, the zelkans model still has some drawbacks, such as the central tendency property does not exist fully and explicitly in the model [36].

Other development of original Tanaka models has been discussed in Tanaka and Lee [36]. Savic and Pedrycz [18] noted that not all data points are allowed to inuence the estimated parameters in fuzzy linear regression (FLR). The fuzzy linear regression may tend to become multi-colinear as more independent variables are collected [13,40]. Furthermore, the original Tanaka model was extremely sensitive to

Table 3 Raw data for Canada. Year Consumption (1015 Btu) Population (Millions) GDP 1995 12.20 29.62 2.5 1996 12.54 29.98 3.7 1997 12.66 30.31 4.5 1998 12.36 30.63 4.2 1999 12.94 30.96 4.4 2000 12.93 31.28 3.7 2001 12.97 31.59 0.8 2002 13.33 31.90 1.6 2003 13.74 32.21 2.5 2004 14.03 32.51 3.6 2005 14.31 32.81 3.1

A. Azadeh et al. / Energy 36 (2011) 6981e6992 Table 4 Raw data for Singapore. Year Consumption (1015 Btu) Population (Millions) GDP 1995 1.185 3.54 8.2 1996 1.352 3.67 7.8 1997 1.451 3.80 8.3 1998 1.499 3.90 1.4 1999 1.485 3.97 7.2 2000 1.522 4.04 10.1 2001 1.611 4.12 2.4 2002 1.592 4.20 4.2 2003 1.682 4.28 3.1 2004 1.893 4.35 8.8

6987

2005 2.023 4.43 6.6

Table 5 Raw data for Pakistan. Year Consumption (1015 Btu) Population (Millions) GDP 1995 1.581 127.62 5 1996 1.669 130.82 4.8 1997 1.692 133.99 1 1998 1.738 137.18 2.6 1999 1.814 140.36 3.7 2000 1.856 143.96 4.3 2001 1.810 147.65 2 2002 1.875 150.42 3.2 2003 1.978 152.94 4.8 2004 2.051 155.85 7.4 2005 2.252 158.78 7.7

Table 6 Raw data for Iran. Year Consumption (1015 Btu) Population (Millions) GDP 1995 1.131 60.78 2.7 1996 1.125 61.34 7.1 1997 1.041 61.91 3.4 1998 1.056 62.41 2.7 1999 1.074 62.83 1.9 2000 1.084 63.27 5.1 2001 1.134 63.75 3.7 2002 1.144 63.94 7.5 2003 1.012 63.99 7.2 2004 1.125 64.33 5.1 2005 1.244 64.74 4.4

Using the idea from zelkan and Duckstein [44] in dealing with outlier Nasrabadi et al. developed a multi-objective fuzzy linear regression model to overcome the shortcomings of exciting fuzzy regression approaches. As mentioned by Peters [9] and zelkan and Duckstein [43], outliers can be models by introducing soft boundaries to the fuzzy linear regression (FLR) model. Moreover, Chang and Ayyub model [14] model (14) is used for the purpose of this study. This model is an extension of Tankas model and resolves the outlier problem associated with previous models.

data and data 2002e2005 is used as test data. The reader should note that the data for both train and test are relatively small. Model (14) which is the extension of Tanaka model is chosen as the best tted fuzzy regression model because it covers the outlier problems. The outlier problem is specically important in treatment of small data sets. The fuzzy regression models were developed and tested by the Lingo software (Appendix 1). The link to this model is also shown in Appendix 2. 4. Results and analysis

Z min
n P i1

n P m P i1 j1

ci xij
n P i1

(14) ci xij

yj

Pi xij 1 h

The estimated results of fuzzy regression, classical regression and actual data are compared by ANOVA F-test. The experiment was designed such that variability arising from time can be systematically controlled through blocking. Therefore a blocked design of ANOVA is applied according to model (2) and the results are shown in Tables 7e16. 4.1. Canada

3. Experiment The proposed framework is applied to energy consumption estimation and forecasting in United States, Canada, Pakistan, Iran and Singapore from 1995 to 2005. It is furthermore used to identify the preferred model to forecast and estimate energy consumption in these countries by the hybrid mechanism of the proposed framework which is based on fuzzy regression, classical regression, F-test, Tukey Simultaneous Tests and MAPE. The raw data with respect to the 2 independent variables for these countries are shown in Tables 2e6. The data from 1995 to 2001 are used as train
Table 7 Error estimation in Canada. Year 2002 2003 2004 2005 MAPE Actual data 13.325 13.736 14.029 14.308 Fuzzy regression 13.330 13.600 14.030 14.110 0.006 Classical regression 13.090 13.230 13.385 13.480 0.0395

As can be seen from Table 7, the results of both fuzzy and classical regression methods with respect to energy consumption estimation are relatively close to actual data. Hence, the results of both methods are veried by actual data for Canada. According to ANOVA results in Table 8, with a 0.05 the null hypothesis is rejected for Canada because p-value of treatment 0.003 < 0.05 and therefore, further analysis needs to be performed to foresee which treatment pairs caused the rejection

Table 8 Analysis of variance for Canada. Source of variation Sum Degrees Mean F-value P-value of squares of freedom squares 2 3 6 11 0.3566 0.3041 0.0187 19.04 0.003

Treatments (regressions) 0.7132 Blocks (years) 0.9123 Interaction (error) 0.1124 Total 1.7379

6988 Table 9 Error estimation in the USA. Year 2002 2003 2004 2005 MAPE Actual data 97.858 98.21 100.351 100.691 Fuzzy regression 98.98 99.46 100.42 100.91 0.0067640

A. Azadeh et al. / Energy 36 (2011) 6981e6992 Table 14 Analysis of variance for Pakistan. Classical regression 97.79 98.63 99.52 99.96 0.0051281 Source of variation Treatments (regressions) Blocks (years) Interaction (error) Total Sum of squares 0.05774 0.09660 0.03083 0.18517 Degrees of freedom 2 3 6 11 Mean squares 0.02887 0.03220 0.00514 F-value 5.62 P-value 0.042

Table 10 Analysis of variance for the USA. Source of variation Sum Degrees Mean F-value P-value of squares of freedom squares 2 3 6 11 0.980 3.560 0.123 7.95 0.021

Table 15 Error estimation in Iran. Year 2002 2003 2004 2005 MAPE Actual data 1.144 1.012 1.125 1.244 Fuzzy regression 1.17 1.173 1.152 1.259 0.054 Classical regression 1.114 1.112 1.095 1.089 0.069

Treatments (regressions) 1.960 Blocks (years) 10.679 Interaction (error) 0.739 Total 13.378

Table 11 Error estimation in Singapore. Year 2002 2003 2004 2005 MAPE Actual data 1.592 1.682 1.893 2.023 Fuzzy regression 1.640 1.681 1.850 1.973 0.0195441 Classical regression 1.652 1.704 1.729 1.785 0.0637625

Table 16 Analysis of variance for Iran. Source of variation Treatments (regressions) Blocks (years) Interaction (error) Total Sum of squares 0.01533 0.01569 0.01879 0.04981 Degrees of freedom 2 3 6 11 Mean squares 0.00767 0.00523 0.00313 F-value 2.45 P-value 0.167

Table 17 Summary of ANOVA and MAPE results. Table 12 Analysis of variance for Singapore. Source of variation Treatments (regressions) Blocks (years) Interaction (error) Total Sum of squares 0.01497 0.16276 0.03335 0.21107 Degrees of freedom 2 3 6 11 Mean squares 0.00748 0.05425 0.00556 F-value 1.35 P-value 0.329 Canada The USA Singapore Pakistan Iran 0.003 0.021 0.329 0.042 0.167 Country P-value MAPE Fuzzy regression 0.00600 0.00670 0.01954 0.01130 0.05400 Classical regression 0.00395 0.00510 0.06376 0.06700 0.06900

of null hypothesis. Furthermore, we use Tukey Simultaneous Test as follows and conclude that fuzzy regression method provides the best estimation for Canada: Comparing treatments 1 and 2: p-value 0.6899 > 0.05 then m1 m2 Comparing treatments 1 and 3: p-value 0.0030 < 0.05 then m1 s m3 Comparing treatments 2 and 3: p-value 0.0067 < 0.05 then m2 s m3

4.2. USA As can be seen from Table 9, the results of both fuzzy and classical regression methods with respect to energy consumption estimation are relatively close to actual data. Hence, the results of both methods are veried by actual data for the USA. According to ANOVA results in Table 10, with a 0.05 the null hypothesis is rejected for US because p-value of treatment 0.021 < 0.05, and therefore, further analysis needs to be performed to foresee which treatment pairs caused the rejection of null hypothesis. Furthermore, we use Tukey Simultaneous Test and conclude that both fuzzy and classical regression methods provide the good estimation for US. However, classical regression is selected because it provides slightly smaller relative error (Table 9). Comparing treatments 1 and 2: p-value 0.0813 > 0.05 then m1 m2 Comparing treatments 1 and 3: p-value 0.4859 > 0.05 then m1 m3 Comparing treatments 2 and 3: p-value 0.0188 < 0.05 then m2 s m3

Table 13 Error estimation in Pakistan. Year 2002 2003 2004 2005 MAPE Actual data 1.875 1.978 2.051 2.252 Fuzzy regression 1.925 1.972 2.052 2.218 0.0113 Classical regression 1.870 1.884 1.894 1.925 0.067

A. Azadeh et al. / Energy 36 (2011) 6981e6992 Table 18 Comparison of the hybrid framework versus other methods. Methods/approaches Features Crisp Data Classical regression models Fuzzy regression models Genetic algorithm Articial neural network Particle swarm optimization The hybrid fuzzy regression-design of experiment framework O O O O O O Non crisp data O Handling data linearity O O O O O O Handling data nonlinearity O O O O O O Conducting experiments Relative error estimation Identifying the best model and solution Flexibility

6989

Small data set O O

4.3. Singapore As can be seen from Table 11, the results of both fuzzy and classical regression methods with respect to energy consumption estimation are relatively close to actual data. Hence, the results of both methods are veried by actual data for Singapore. According to ANOVA results in Table 12, with a 0.05 the null hypothesis is accepted for Singapore because: p-value of treatment 0.329 > 0.05 and selection of the fuzzy regression or classical regression model is based on the lower value of MAPE. Therefore, fuzzy regression method has resulted in lower relative error and provides better tness when compared with actual data (Table 11). 4.4. Pakistan As can be seen from Table 13, the results of both fuzzy and classical regression methods with respect to energy consumption estimation are relatively close to actual data. Hence, the results of both methods are veried by actual data for Pakistan. According to ANOVA results in Table 14, with a 0.05 the null hypothesis is rejected for Pakistan because p-value of treatment 0.042 < 0.05, and therefore, further analysis needs to be performed to foresee which treatment pairs caused the rejection of null hypothesis. Furthermore, we use Tukey Simultaneous Test and conclude that both fuzzy and classical regression methods provide the good estimation for US. However, fuzzy regression is selected because it provides slightly smaller relative error (Table 13). Comparing treatments 1 and 2: p-value 0.9984 > 0.05 then m1 m2 Comparing treatments 1 and 3: p-value 0.0635 > 0.05 then m1 m3 Comparing treatments 2 and 3: p-value 0.0594 > 0.05 then m2 m3

Therefore, fuzzy regression method has resulted in lower relative error and provides better tness when compared with actual data (Table 15). 4.6. Technical note Denitely, there is no need to forecast what we already know. However, this study selects the data in the years 2002e2005 to test the forecasting accuracy of the models and to compare them with the preferred model. Moreover, ANOVA is used to examine if there is differences between what forecasted by regressions and actual data. In the case of rejected H0, it is concluded that the differences between the results of classical regression and fuzzy regression are statistically signicant and therefore we cannot rely on MAPE to select the preferred model. In this case, Tukey Simultaneous Tests method is used to identify which model is closer to actual data at a level of signicance. On the other hand, when H0 cannot be rejected, it is inferred that there is not signicant differences between the forecasting results of regression models. As a result, we rely on MAPE to decide the preferred forecasting model. 5. Conclusion This research presented a hybrid framework to estimate and predict energy consumption with small data sets. To show the applicability of the proposed framework, annual energy consumption in Canada, United States, Pakistan, Singapore and Iran from 1995 to 2005 was used. Then, ANOVA was applied to compare the proposed fuzzy regression, classical regression and actual data. It was found that the null hypothesis is true for Iran and Singapore and MAPE was used to identify which model is closer to the actual data and consequently fuzzy regression estimations are closer to actual data. The null hypothesis was false for Canada, Pakistan and US and therefore, Tukey Simultaneous Tests were used to identify which model is closer to the actual data. It was shown that the fuzzy regression has better estimated values for energy consumption in Canada and Pakistan. However, the classical regression has better estimated values for energy consumption in US. Table 17 presents the summary of ANOVA and MAPE results for Canada, the USA, Singapore, Pakistan and Iran. As seen, the MAPE results of the selected fuzzy regressions are very close to classical regressions and are therefore veried by classical regression. The proposed framework is highly exible because it can handle data nonlinearity, fuzziness, data complexity, crisp data and both classical and fuzzy regression models. It can further handle noise and outliers. Table 18 presents a comparison

4.5. Iran As can be seen from Table 15, the results of both fuzzy and classical regression methods with respect to energy consumption estimation are relatively close to actual data. Hence, the results of both methods are veried by actual data for Iran. According to ANOVA results in Table 16, with a 0.05 the null hypothesis is accepted for Iran because p-value of treatment 0.167 > 0.05 and selection of the fuzzy regression or classical regression model is based on the lower value of MAPE.

6990

A. Azadeh et al. / Energy 36 (2011) 6981e6992

between the hybrid framework and existing methods. As shown, it is superior and has several advantages over existing methods. Furthermore, it can conduct both design of experiments and minimum absolute percentage error to show which model (fuzzy or classical) is superior. It also identies the best classical and fuzzy regression models prior to design of experiment. Finally, the framework is superior to previous approaches because it can handle relatively data sets such as the ve actual examples of this paper. In summary, this study presented a hybrid framework for forecasting energy consumption based on fuzzy regression, classical regression and DOE for small data sets. The economic indicators used in this paper are population and Gross Domestic Production (GDP) in the last periods. The proposed framework uses ANOVA to select either fuzzy regression or classical regression for future demand estimation. Furthermore, if the null hypothesis in ANOVA F-test is rejected, Tukey Simultaneous Tests method is used to identify which model is closer to actual data at a level of signicance. It also uses MAPE when the null hypothesis in ANOVA is accepted to select from fuzzy regression or classical regression model. The signicance of the proposed framework is three fold. First, it is exible and identies the best model based on the results of ANOVA and MAPE. Second, the proposed model may identify classical regression as the best model for future energy consumption forecasting because of its dynamic structure, whereas in the case of uncertainty and ambiguity, previous studies assume that fuzzy regression

provides better solutions and estimation. Third, it is ideal for relatively small data sets. Future research could extend the present framework by both conventional and intelligent multivariate nonlinear time series. The signicance of the nonlinear test statistics has been determined by a recent study [45] on surrogate data. Thus, articial neural network (ANN) and adaptive network based fuzzy inference system (ANFIS) may be used to develop frameworks for energy consumption estimation with nonlinear pattern and uncertainty. Acknowledgment The authors are grateful for the valuable comments and suggestions from the respected reviewers. Their valuable comments and suggestions have enhanced the strength and signicance of our paper. This study was supported by a Grant from University of Tehran (Grant No. 8106013/1/07). The authors acknowledge the support provided by the University College of Engineering, University of Tehran, Iran.

Appendix 1. Sample Lingo codes for Canada

Canada; !objective function; minc0343.7871*c134.60*c2; !constraints; !CT 1; p029.61*p12.50*p2-(1-h)* (c029.619*c12.5*c2)<12.202; !CT 2; p029.61*p12.50*p2(1-h)* (c029.619*c12.5*c2)>12.202; !CT 3; p029.983*p13.70*p2-(1-h)* (c029.983*c13.70*c2)<12.539; !CT 4; p029.983*p13.70*p2(1-h)* (c029.983*c13.70*c2)>12.539; !CT 5; p030.305*p14.5*p2-(1-h)* (c030.305*c14.5*c2)<12.655; !CT 6; p030.305*p14.5*p2(1-h)* (c030.305*c14.5*c2)>12.655; !CT 7; p030.628*p14.2*p2-(1-h)* (c030.628*c14.2*c2)<12.361; !CT 8; p030.628*p14.2*p2(1-h)* (c030.628*c14.2*c2)>12.361; !CT 9; p030.956*p14.4*p2-(1-h)* (co30.957*c14.4*c2)<12.939; !CT 10; p030.956*p14.4*p2(1-h)* (co30.957*c14.4*c2)>12.939; !CT 11; p031.2781*p13.7*p2-(1-h)* (c031.2781*c13.7*c2)<12.932; !CT 12; p031.2781*p13.7*p2(1-h)* (c031.2781*c13.7*c2)>12.932; !CT 13; p031.5928*p10.8*p2-(1-h)* (c031.5928*c10.8*c2)<12.967; !CT 14; p031.5928*p10.8*p2(1-h)* (c031.5928*c10.8*c2)>12.967; !CT 15; p031.9022*p11.6*p2-(1-h)* (c031.9022*c11.6*c2)<13.325; !CT 16; p031.9022*p11.6*p2(1-h)* (c031.9022*c11.6*c2)>13.325; !CT 17; p032.2071*p12.5*p2-(1-h)* (c032.2071*c12.5*c2)<13.736; !CT 18; p032.2071*p12.5*p2(1-h)* (c032.2071*c12.5*c2)>13.736;

A. Azadeh et al. / Energy 36 (2011) 6981e6992

6991

Appendix 2

Model :!(Extended Tanaka Model of Fuzzy Regression); Sets: !Import Sets from Excel; Input: Response; Component: Centers,Spreads; Links(Input,Component): Train; Endsets Min @SUM(Input(I):@SUM(Component(J):Spreads(J)*Train(I,J))); @FOR(Input(I):@SUM(Component(J):Centers(J)*Train(I,J)(1-H)*Spreads (J)*Train(I,J))>Response(I)); @FOR(Input(I):@SUM(Component(J):Centers(J)*Train(I,J)-(1-H)*Spreads (J)*Train(I,J))<Response(I)); Data: !Import the data from Excel; Input, Component, Response, Train, H (@OLED:\Lingo8\chang.xls); Enddata End

References
[1] Pao HT. Forecasting energy consumption in Taiwan using hybrid nonlinear models. Energy 2009;34(10):1438e46. [2] Zhang M, Mu H, Li G, Ning Y. Forecasting the transport energy demand based on PLSR method in China. Energy 2009;34(9):1396e400. [3] Cinar D, Kayakutlu G, Daim T. Development of future energy scenarios with intelligent algorithms: case of hydro in Turkey. Energy 2010;35(4): 1724e9. [4] Afshar K, Bigdeli N. Data analysis and short term load forecasting in Iran electricity market using singular spectral analysis (SSA). Energy 2011;36(5): 2620e7. [5] Wang HF, Tsaur RC. Insight of a fuzzy regression model. Fuzzy Sets and System 2000;112:355e69. [6] Ross TJ. Fuzzy logic with engineering applications. New York: McGraw-Hill; 1995. [7] Tanaka H, Uejima S, Asia K. Linear regression analysis with fuzzy model. IEEE Transactions on Systems, Man, and Cybernetics 1982;12(6):903e7. [8] Tanaka H, Watada J. Possibilistic linear systems and their application to the linear regression model. Fuzzy Sets and System 1988;27:275e89. [9] Peters G. Fuzzy linear regression with fuzzy intervals. Fuzzy Sets and System 1994;63:45e55. [10] Sakawa M, Yano H. Multiobjective fuzzy linear regression analysis and its application. Electronics and Communications in Japan 1990;73:1e9. [11] Chang PT, Lee ES. A generalized fuzzy weighted least-squares regression. Fuzzy Sets and Systems 1996;82:289e98. [12] Heshmaty B, Kandel A. Fuzzy linear regression and its applications to forecasting in uncertain environment. Fuzzy Sets and Systems 1985;15: 159e91. [13] Wang HF, Tsuar RC. Resolution of fuzzy regression model. European Journal of Operational Research 2000;126:637e50. [14] Chang Y-HO, Ayyub BM. Fuzzy regression methods e a comparative assessment. Fuzzy Sets and Systems 2001;119:187e203. [15] Celmins A. Least squares model tting to fuzzy vector data. Fuzzy Sets and Systems 1987;22:245e69. [16] Celmins A. Multidimensional least-squares model tting of fuzzy models. Mathematical Modeling 1987;9:669e90. [17] Diamond P. Fuzzy least squares. Information Sciences 1988;46:141e57. [18] Savic D, Pedrycz W. Evaluation of fuzzy regression models. Fuzzy Sets and Systems 1991;39:51e63. [19] Chang Y-HO, Ayyub BM. Reliability analysis in fuzzy regression. In: Proc. annual conf. of the North American fuzzy information processing society (NAFIPS93), Allentown, PA, USA; 1993. p. 93e7. [20] Azadeh A, Ghaderi SF, Sohrabkhani S. Annual electricity consumption forecasting by neural network in high energy consuming industrial sectors. Energy Conversion and Management 2008;49(8):2272e8. [21] Nasr GE, Badr EA, Younes MR. Neural networks in forecasting electrical energy consumption: univariate and multivariate approaches. International Journal of Energy Research 2002;26(1):67e78.

[22] Al-Shehri A. Articial neural network for forecasting residential electrical energy. International Journal of Energy Research 1999;23(8): 649e61. [23] Azadeh A, Saberi M, Seraj O. An integrated fuzzy regression algorithm for energy consumption estimation with non-stationary data: a case study of Iran. Energy 2010;35:2351e66. [24] Chang PT, Konz SA, Lee ES. Applying fuzzy linear regression to VDT legibility. Fuzzy Sets and Systems 1996;80:197e204. [25] Lai YJ, Chang SI. A fuzzy approach for multi response optimization: an off-line quality engineering problem. Fuzzy Sets and Systems 1994;63:117e29. [26] Shakouri H, Nadimi R, Ghaderi F. A hybrid TSK-FR model to study short-term variations of the electricity demand versus the temperature changes. Expert Systems with Applications 2009;36(2):1765e72. [27] Azadeh A, Saberi M, Ghaderi SF, Gitiforouz A, Ebrahimipour V. Improved estimation of electricity demand function by integration of fuzzy system and data mining approach. Energy Conversion and Management 2008;49(8): 2165e77. [28] Shakouri H, Nadimi R. A novel fuzzy linear regression model based on a nonequality possibility index and optimum uncertainty. Applied Soft Computing 2009;9:590e8. [29] Montgomery DC. Design and analysis of experiments. 5th ed. New York: John Wiley & Sons; 2001. p. 126e30. [30] Diamond P, Tanaka H. Fuzzy regression analysis. In: Stowinski R, editor. Fuzzy sets in decision analysis, operations research and statistics. Boston: Kluwer Academic Publishers; 1998. [31] zelkan EC. Multi-objective fuzzy regression applied to the calibration rainfallerunoff models. Unpublished Ph.D. dissertation, Department of Systems and Industrial Engineering, The University of Arizona; 1997. [32] Tanaka H. Fuzzy data analysis by possibilistic linear models. Fuzzy Sets and Systems 1987;24:363e75. [33] Sakawa M, Yano H. Multiobjective fuzzy linear regression analysis for fuzzy inputeoutput data. Fuzzy Sets and Systems 1992;47:173e81. [34] Kim B, Bishu RR. Evaluation of fuzzy linear regression models by comparing membership functions. Fuzzy Sets and Systems 1998;100:342e52. [35] Bardossy A, Duckstein RL. Fuzzy least squares regression: theory and application. In: Kacprzyk J, Fedrizi M, editors. Fuzzy regression analysis. Heidelberg: Omnitech Press, Warsaw and Physica-Verlag; 1992. p. 181e93. [36] Tanaka H, Lee H. Fuzzy linear regression combining central tendency and possibilistic properties. Proceedings of Fuzzy Conference-IEEE; 1997:63e8. [37] Jozsef S. On the effect of linear data transformations in possibilistic fuzzy linear regression. Fuzzy Sets and Systems 1992;45:185e8. [38] Tanaka H, Hayashi I, Watada J. Possibilistic linear regression analysis for fuzzy data. European Journal of Operational Research 1989;40:389e96. [39] Redden DT, Woodall WH. Properties of certain fuzzy linear regression models. Fuzzy Sets and Systems 1994;64:361e75. [40] Kim KJ, Mosskowitz H, Koksalan M. Fuzzy versus statistical linear regression. European Journal of Operational Research 1996;92:417e34. [41] Redden DT, Woodall WH. Further examination of fuzzy linear regression. Fuzzy Sets and Systems 1996;79:203e11.

6992

A. Azadeh et al. / Energy 36 (2011) 6981e6992 [44] Tran L, Duckstein L. Multi-objective fuzzy regression with central tendency and possibilistic properties. Fuzzy Sets and Systems 2002;130: 21e31. [45] Gao W, Tian Z. Learning Granger causality graphs for multivariate nonlinear time series. Journal of Systems Science and Systems Engineering 2009;18(1): 38e52.

[42] zelkan EC, Duckstein L. Fuzzy regression analysis of tradeoff between data outliers and prediction vagueness in fuzzy regression using a bi-objective framework. In: Proc. EUFIT98-6th European congress on intelligent techniques and soft computing; 1998. p. 1048e51. [43] zelkan EC, Duckstein L. Multi-objective fuzzy regression: a general framework. Computer and Operations Research 2000;27:635e40.

You might also like