A Comparative Analysis of Hold Out - Cross and Re-Substitution Validation in Hyper-Parameter Tuned Stochastic Short Term Load Forecasting

2022 22nd National Power Systems Conference (NPSC)
A Comparative Analysis of Hold Out, Cross and

Re-Substitution Validation in Hyper-Parameter
Tuned Stochastic Short Term Load Forecasting
B V Surya Vardhan Mohan Khedkar Prajwal Thakre
2022 22nd National Power Systems Conference (NPSC) | 978-1-6654-6202-0/22/$31.00 ©2022 IEEE | DOI: 10.1109/NPSC57038.2022.10069288
Electrical Department Electrical Deparmtent Electrical Department

Visvesvaraya National Institute of Visvesvaraya National Institute of Visvesvaraya National Institute of
Technology Technology Technology
Nagpur, India Nagpur, India Nagpur,India
suryavardhan@students.vnit.ac.in mohnakhedkar@eee.vnit.ac.in prajwalthakre007@gmail.com
Abstract—Analysis of load plays an important role in the

operation of modern power systems due to its highly intermittent
nature. This manuscript proposes the best approach by compar-
ing results of Hold out, Cross and Re-Substitution validation from
hyperparameter tuned Short Term Load Forecasting (STLF).
Tree, Neural Network and GPR (Gaussian Process Regression)
are three stochastic regression methods used. Each validation
procedure is compared with every considered regression method,
leading to 9 such combinations. Each combination is analysed
with statistical parameters like RMSE (Root Mean Square
Error), R Squared, MSE(Mean Square Error), MAE (Mean
Absolute Error) and training time. The best approach is further
optimised by modifying hyper parameters using Bayesian, Grid
Search, and Random Search and most suitable method is pro-
posed. The simulations are performed in Python and MATLAB
platforms. The best combination for computation of STLF is ”K-
fold validation with Tree Regression”. The statistical parameters
obtained from the combination are RMSE, R Squared, MSE,
MAE and training time of 0.077, 0.88, 0.0059, 0.046, 1.2 respec- Fig. 1. Yearly Peak Demand V/s Peak Demand met in India [2]
tively. The best method for hyper-parameter tuning is found out
to be Grid search with a reduced MSE of 0.0023.
Index Terms—Short Term Load Forecasting, Tree, Neural
set, and the test set is used to obtain a fair assessment of the
Network, Gaussian Process Regression, Cross Validation, Hold
Out Validation, Re-Substitution performance of the final model. In Cross Validation the data
set is randomly divided into ’k’ groups, also referred as ’k-
fold cross-validation. One group is used as the test set while
I. I NTRODUCTION
the rest of the groups are used as the training set. On the
Load plays an important role in Power system analysis de- training set, the model is developed, and on the test set, it
sign. Accurate load prediction leads to planned power schedul- is evaluated. This procedure is repeated until every distinct
ing [1]. It can be observed from Fig. 1 that there is a difference group has been used as a test set. If all of the data were
between Peak Demand and Demand Met in India [2]. This utilised to train the model and the error rate is assessed based
difference can be solved using accurate Load Forecasting. In on the predicted vs. actual value from the same training data
countries like India, Load Forecasting is usually divided into 3 set which is known as re-substitution error and this technique
parts: the Short Term (STLF), the Medium Term(MTLF),and is called the re-substitution(Resub) validation technique. A
the Long Term (LTLF). STLF covers a time span of one hour detailed comparative analysis of all the validation techniques
to one week. MTLF is from one week to one year, while LTLF are discussed in [3]. There is no protection to over fitting in
lasts from one year to twenty years.The forecasting covered re-substitution.
in this manuscript is STLF. Implementation of STLF using decision trees is illustrated
Validation is an important aspect in machine learning as in [4]. Compared to other approaches, the preprocessing of
it is key for determining efficiency of algorithm. Hold-out a decision tree requires less effort and does not require data
refers to the process of dividing the assumed data set into normalisation. One of the limitations of decision trees is their
a ”train” and ”test” set. The model is trained on the training vulnerability to data changes in the system. Neural networks
978-1-6654-6202-0/22/$31.00 ©2022 IEEE
978-1-6654-6202-0/22/$31.00 ©2022 IEEE 448

(NN) are capable of solving issues properly when data is

constantly changing and can interact successfully with Data
Base Management Systems (DBMSs) [5]. Implementation of
Gaussian Process Regression (GPR) in forecasting applications
is illustrated in [6]. GPR can efficiently handle uncertainties,
but it cannot detect correlations between input parameters.
A hyperparameter is a parameter whose value regulates the
learning process. In machine learning, determining the opti-
mal choice of hyper parameters for a training algorithm is
referred to as hyperparameter optimization or tuning. In [7], a
comprehensive analysis of hyperparameter tuning is presented.
Salient aspects of this manuscript are :-

• Nine different combinations are formed from three val-
idation and three regression analysis techniques. Three
Fig. 2. STLF Implementation Flowchart
techniques used for validation are Cross, Holdout and Re-
substitution. Three methods used for regression analysis
are Decision Tress, Neural Networks and GPR. to reduce MSE. Grid Search, Random Search, and Bayesian
• Each combination is analysed and compared using sta-
are the optimization approaches applied.
tistical tools like RMSE (Root Mean Square Error), R2 ,
MSE(Mean Square Error), MAE (Mean Absolute Error)
and training time and best method is proposed.
• The best approach is further optimised by modifying hy-
per parameters using Bayesian, Grid Search, and Random
Search and most suitable method is proposed.
II. S HORT T ERM L OAD F ORECASTING M ETHODOLOGY
STLF has a wide range of applications in power system
operation. This applications include power scheduling, trading,
pricing etc. A comprehensive analysis of the methodology
used in this manuscript is shown in Fig. 2. Outliers are
removed from the data using IQR(Inter quartile range) method-
ology. Following equations are used to compute Lower Bound
(LB) and Upper Bound (UB) using IQR method.
Fig. 3. Mapping of Validation-Regression Methods for STLF
IQR = Quart3 − Quart1 (1)
LB = Quart1 − 1.5 ∗ IQR (2)
III. A NALYSES OF A LGORITHMS
U B = Quart3 + 1.5 ∗ IQR (3)
This section gives a brief introduction to all the algorithms
The difference between the first quartile and third quartile used in validation, training, and tuning.
is computed in (1). IQR represents this difference. The lower
and upper bounds of the data using the IQR are obtained in (2)
A. Validation Methods
and (3). Data is considered an outlier, if it is either below or
above the lower and upper limits. For large data sets outliers 1) Cross Validation: Cross-validation is a method of evalu-
are removed and for small data sets outliers are replaced by ating machine learning models that involves training them on
Average, Median, RMS (Root mean Square) values etc. In the distinct subsets of the available input data and then testing
proposed methodology outliers are removed. them on the other group. Cross-validation can detect over
All assumed validation and regression approaches are com- fitting, which occurs when a pattern fails to generalise. To do
bined. The mapping illustrated in Fig. 3 can be used to create cross-validation in machine learning (ML), the k-fold cross-
these combinations. There are nine such pairings. All nine validation approach can be used. In k-fold cross-validation, the
combinations are used to train cleaned data. Statistical methods input data is separated into k subsets (also known as folds).
such as RMSE (Root Mean Square Error), R2, MSE (Mean A model is trained on all but one (k-1) of the subsets, and
Square Error), MAE (Mean Absolute Error), and training then it is assessed on the subset that was not trained on. This
duration are used to find the ideal combination. The best process is repeated k times, with a new subset put aside for
combination is further optimised by adjusting hyperparameters the assessment each time (and not used for training).
449
2) Hold Out Validation: The hold-out method is a way a sufficient number of cases have been processed, the neural
to train machine learning models that involves splitting the network may begin to receive additional, previously unknown
data into different sets: one set for training, and other sets inputs and produce accurate results. Because computers learn
for validation and testing. The hold-out method is used to by experience, the more examples it meets, the more accurate
test how well a machine learning model will work with new the result becomes [1].
data. This method is commonly employed when the data set 3) Gaussian Process Regression : A Gaussian process
is small and there is insufficient data to split into three parts regression (GPR) model can forecast using past knowledge
(training, validation, and testing). This method has the benefit (kernels) and offer uncertainty estimates for all of those
of being simple to apply, but it is susceptible to how the data predictions. To do regression, the GaussianProcessRegressor
is separated into two sets. The findings may be affected if the employs Gaussian processes (GP). For this, the GP’s past
split is not random. Overall, the holdout technique for model experience must be indicated. The prior mean is assumed
assessment is a decent starting point for training machine to be constant and zero (for normalising y=False) or equal
learning models, but it should be used with caution. to the mean of the training data (for normalising y=True).
3) Re-substitution Validation: The re-substitution valida- The covariance of the prior is specified using a kernel object.
tion approach utilises all of the data to train the model and The kernel hyper parameters are optimised during Gaus-
evaluates the error rate based on the projected vs actual value sianProcessRegressor fitting by maximising the log-marginal-
from the same training data set, also known as re-substitution likelihood (LML) depending on the optimizer used. Because
error. Re-substitution validation is the basic approach to vali- the LML may include numerous local optima, the optimizer
date data and there is no protection against over fitting in this can be restarted by specifying n restarts optimizer. The first
approach. run is always begun from the kernel’s original hyperparam-
eter values; subsequent runs are started from hyperparameter
B. Training Methods values chosen at random from the range of allowed values [8].
1) Decision Trees: A decision tree employs a flowchart-
like tree structure to depict the predictions that arise from C. Methods Related to Hyper-Parameter Tuning
a sequence of feature-based splits. The end is determined 1) Grid Search: The grid search is the most popular ap-
by the leaf node, which begins with a root node. Decision proach for adjusting hyperparameters because of its simplicity
tree regression analyses an object’s attributes and develops a and convenience of use. It is an ignorant search method,
model in the form of a tree to predict future data and produce suggesting that it does not take into account past iterations of
meaningful continuous output. The goal of using decision the search. To identify the optimal combination of hyper pa-
trees is to create a training model that predicts the class or rameters, this approach examines every potential combination
value of a response variable based on fundamental decision in the search space. This approach cannot handle. Expanding
rules learned from previous data (training data). Predicting a hyperparameter search space as it leads to exponential rise in
classifier for a data set begins at the tree’s root. The values of run time and processing.
the root property are compared to the attributes of the record. 2) Random Search: Another method that treats iterations
Based on the comparison, the branch that corresponds to that separately is random search. It examines a random number of
value is followed, and the process continues to the next node. hyperparameter sets rather than searching for all of them in
The decision to divide a tree strategically has a significant the search space. This number is set by the user. Since there
impact on its accuracy. Classification and regression trees are fewer hyperparameter tuning trials, this strategy uses less
have different decision criteria. Decision trees use numerous computing time and runs faster than grid search. Due to the
techniques to determine whether to divide a node into two or fact that the random search considers hyperparameter sets at
more sub-nodes. The node’s purity improves in connection to random, it runs the risk of overlooking the ideal collection of
the data point. The decision tree separates the nodes based on hyper parameters and decreasing model performance.
all process parameters,then finds the split that results in the 3) Bayesian Optimization: Bayesian optimization, in con-
most homogeneous sub-nodes [8]. trast to grid and random search techniques, is an informative
2) Neural Networks: A neural network is a sort of com- search methodology, meaning it learns from previous itera-
putational learning system that uses a network of functions to tions. This technique allows the user to set the number of
interpret and transform data inputs of one form into a desired trials. The Bayes theorem is used as a basis for this strategy.
output, which is usually in another form. The neural network The following equation is used to calculate the probability of
concept was inspired by the human biology and how neurons the hyperparameter value.
in the human brain interact to interpret sensory information.
p(score|hpm) × p(score)
In general, neural network based machine learning algorithms p(score|hpm) = (4)
do not require specific rules specifying what to expect from p(hpm)
the input. Instead, the neural net learning algorithm learns The Bayes theorem in modified form used for hyperparame-
by analysing a large number of labelled examples presented ter tuning is shown in (4), where hpm signifies hyperparameter
during training and using this answer key to determine which and score is the trail of current value. p(score) is the prob-
input traits are required to produce the correct output. After ability of occurrence of score and p(hpm) is probability of
450
occurrence of hyperparameter. p(score|hpm) is the probability

of score given probability of hpm. This technique can find
the ideal hyper parameters without manually attempting every
hyperparameter configuration or randomly testing all of the
possible hyperparameters. As a result, it is possible to obtain
the optimal hyper parameters without having to examine the
whole sample space.
IV. R ESULTS AND A NALYSIS

Results from different validation methods-regression com-
binations and hyper parameter optimization are analysed in
this part. The data set from [9] is assumed for the purposes
of analysing the proposed methodology. Peak value serves as
a reference point for the data, which is scaled from 0 to 1.
The IQR technique is used to address outliers in the data, and
it is found that 1% of the values are outliers. These outliers
Fig. 4. Actual V/s Predicted using Hold Out Validation and Tree Forecaster
have been eliminated.As discussed, 9 different combinations
have been formed from mapping of validation and regression
methods. All these methods have been trained. For Hold out
validation 80% is considered as test and 20% as train data.
Five fold technique is considered for K-fold validation, where
at any time 4 folds are considered as train and 1 fold as test
data. For re-substitution all the data is validated as procedure
suggests and there is no scope for over protection.
Actual versus predicted values of all the combinations are
presented from Fig. 4 to Fig. 12. A detailed analysis of all
the statistical parameters obtained from Fig. 4 to Fig. 12 are
given in Table I. It can be observed from Table I that the
best combination for required output is ”K-fold validation with
Tree Regression”. The statistical parameters obtained from the
combination are RMSE, R2 , MSE, MAE and training time of
0.077, 0.88, 0.0059, 0.046, 1.2 respectively. This is because it
has highest R2 with allowable training time.The best combina-
tion is further optimized with an objective of reducing MAE
using Grid Search, Random search and Bayesian optimization. Fig. 5. Actual V/s Predicted using K-Fold Validation and Tree Forecaster
The analysis is presented in Table II. The best method
found out to be Grid search with a reduced MSE OF 0.0023
as this method is producing least MSE as compared to all
the computed methods. Graphical analysis of hyperparameter
tuning is presented in Fig. 13. Linearity analysis for hyper-
parameter tuned K-fold validation with Tree Regression is
shown in Fig. 14.
TABLE I
C OMPARATIVE A NALYSIS OF ALL VALIDATION -F ORECASTING M APPING
Validation Regression RMSE R2 MSE MAE Training

Technique Time(Seconds)
Hold Out Tree 0.09 0.77 0.0081 0.061 0.3095

Hold out Neural Network 0.089 0.78 0.008 0.061 20.83
Hold Out GPR 0.081 0.82 0.006 0.056 23.6
K Fold Tree 0.077 0.88 0.0059 0.046 1.2
K Fold Neural Network 0.089 0.78 0.008 0.062 21.2
K Fold GPR 0.083 0.81 0.0069 0.058 89.9
Re-substitution Tree 0.052 0.93 0.0027 0.03 1.75
Re-substitution Neural Network 0.083 0.82 0.006 0.056 3.54
Re-substitution GPR 0.081 0.82 0.0065 0.056 15.56
Fig. 6. Actual V/s Predicted using Resub Validation and Tree Forecaster
451
Fig. 7. Actual V/s Predicted using Hold Out Validation and NN Forecaster Fig. 10. Actual V/s Predicted using Hold Out Validation and GPR Forecaster
Fig. 8. Actual V/s Predicted using K-Fold Validation and NN Forecaster Fig. 11. Actual V/s Predicted using K-Fold Validation and GPR Forecaster
Fig. 9. Actual V/s Predicted using Resub Validation and NN Forecaster Fig. 12. Actual V/s Predicted using Resub Validation and GPR Forecaster
452
TABLE II they don’t have protection against over fitting, since no

H YPER -PARAMETER T UNING (K F OLD VALIDATION AND T REE separate data-set is used for testing, hence not considered
F ORECASTER )
as best method.
Method MSE Leaf Size • The best approach is further optimised by modifying hy-
per parameters using Bayesian, Grid Search, and Random
Bayesian Optimization
Grid Search
0.0026
0.0023
3
3
Search and most suitable method is proposed. The best
Random Search 0.0033 5 method found out to be Grid search with a reduced MSE
of 0.0023 as this method is producing least MSE as
compared to all the computed methods.
R EFERENCES
[1] B. V. S. Vardhan, M. Khedkar, and I. Srivastava, “Cost effective
day -ahead scheduling with stochastic load and intermittency
forecasting for distribution system considering distributed energy
resources,” Energy Sources, Part A: Recovery, Utilization, and
Environmental Effects, vol. 0, no. 0, pp. 1–26, 2021. [Online]. Available:
https://doi.org/10.1080/15567036.2021.1983669
[2] M. of Power., “Annual report 2021-22,” https://powermin.gov.in, 2021.
[3] S. Yadav and S. Shukla, “Analysis of k-fold cross-validation over hold-out
validation on colossal datasets for quality classification,” in 2016 IEEE
6th International Conference on Advanced Computing (IACC), 2016, pp.
78–83.
[4] Z. Xie, R. Wang, Z. Wu, and T. Liu, “Short-term power load forecasting
model based on fuzzy neural network using improved decision tree,” in
2019 IEEE Sustainable Power and Energy Conference (iSPEC), 2019,
pp. 482–486.
[5] S. Hosein and P. Hosein, “Load forecasting using deep neural networks,”
in 2017 IEEE Power Energy Society Innovative Smart Grid Technologies
Fig. 13. Hyper-parameter tuning Graph of Grid Search Conference (ISGT), 2017, pp. 1–5.
[6] M. Nejati and N. Amjady, “A new solar power prediction method based on
feature clustering and hybrid-classification-regression forecasting,” IEEE
Transactions on Sustainable Energy, vol. 13, no. 2, pp. 1188–1198, 2022.
[7] T. Yu and H. Zhu, “Hyper-parameter optimization: A review of algorithms
and applications,” arXiv preprint arXiv:2003.05689, 2020.
[8] B. V. S. Vardhan, M. Khedkar, and K. Shahare, “A comparative analysis
of various stochastic approaches for short term load forecasting,” in 2022
International Conference for Advancement in Technology (ICONAT),
2022, pp. 1–6.
[9] M. Mohd Hussain, “Load data in malaysia,” 2020. [Online]. Available:
https://dx.doi.org/10.21227/67vy-bs34
Fig. 14. Linearity analysis of Tuned K Fold validation-Tree Forecaster
V. C ONCLUSION
• Nine different combinations are formed from three val-
idation and three regression analysis techniques. The
techniques used for validation are Cross, Holdout and
Re-substitution. The methods used for regression analysis
are Decision Tress, Neural Networks and GPR. It is
found that the best combination for required output is
”K-fold validation with Tree Regression”. The statistical
parameters obtained from the combination are RMSE,
R2 , MSE, MAE and training time of 0.077, 0.88, 0.0059,
0.046, 1.2 respectively. Although Tree forecaster with
Re-Substitution validation is giving high accuracy but
453

A Comparative Analysis of Hold Out - Cross and Re-Substitution Validation in Hyper-Parameter Tuned Stochastic Short Term Load Forecasting

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Comparative Analysis of Hold Out - Cross and Re-Substitution Validation in Hyper-Parameter Tuned Stochastic Short Term Load Forecasting

Uploaded by

Copyright:

Available Formats

2022 22nd National Power Systems Conference (NPSC)

A Comparative Analysis of Hold Out, Cross and

Electrical Department Electrical Deparmtent Electrical Department

Abstract—Analysis of load plays an important role in the

978-1-6654-6202-0/22/$31.00 ©2022 IEEE

978-1-6654-6202-0/22/$31.00 ©2022 IEEE 448

(NN) are capable of solving issues properly when data is

Salient aspects of this manuscript are :-

occurrence of hyperparameter. p(score|hpm) is the probability

IV. R ESULTS AND A NALYSIS

Validation Regression RMSE R2 MSE MAE Training

Hold Out Tree 0.09 0.77 0.0081 0.061 0.3095

TABLE II they don’t have protection against over fitting, since no

Fig. 14. Linearity analysis of Tuned K Fold validation-Tree Forecaster

You might also like