Professional Documents
Culture Documents
A Comparative Analysis of Hold Out - Cross and Re-Substitution Validation in Hyper-Parameter Tuned Stochastic Short Term Load Forecasting
A Comparative Analysis of Hold Out - Cross and Re-Substitution Validation in Hyper-Parameter Tuned Stochastic Short Term Load Forecasting
449
2022 22nd National Power Systems Conference (NPSC)
2) Hold Out Validation: The hold-out method is a way a sufficient number of cases have been processed, the neural
to train machine learning models that involves splitting the network may begin to receive additional, previously unknown
data into different sets: one set for training, and other sets inputs and produce accurate results. Because computers learn
for validation and testing. The hold-out method is used to by experience, the more examples it meets, the more accurate
test how well a machine learning model will work with new the result becomes [1].
data. This method is commonly employed when the data set 3) Gaussian Process Regression : A Gaussian process
is small and there is insufficient data to split into three parts regression (GPR) model can forecast using past knowledge
(training, validation, and testing). This method has the benefit (kernels) and offer uncertainty estimates for all of those
of being simple to apply, but it is susceptible to how the data predictions. To do regression, the GaussianProcessRegressor
is separated into two sets. The findings may be affected if the employs Gaussian processes (GP). For this, the GP’s past
split is not random. Overall, the holdout technique for model experience must be indicated. The prior mean is assumed
assessment is a decent starting point for training machine to be constant and zero (for normalising y=False) or equal
learning models, but it should be used with caution. to the mean of the training data (for normalising y=True).
3) Re-substitution Validation: The re-substitution valida- The covariance of the prior is specified using a kernel object.
tion approach utilises all of the data to train the model and The kernel hyper parameters are optimised during Gaus-
evaluates the error rate based on the projected vs actual value sianProcessRegressor fitting by maximising the log-marginal-
from the same training data set, also known as re-substitution likelihood (LML) depending on the optimizer used. Because
error. Re-substitution validation is the basic approach to vali- the LML may include numerous local optima, the optimizer
date data and there is no protection against over fitting in this can be restarted by specifying n restarts optimizer. The first
approach. run is always begun from the kernel’s original hyperparam-
eter values; subsequent runs are started from hyperparameter
B. Training Methods values chosen at random from the range of allowed values [8].
1) Decision Trees: A decision tree employs a flowchart-
like tree structure to depict the predictions that arise from C. Methods Related to Hyper-Parameter Tuning
a sequence of feature-based splits. The end is determined 1) Grid Search: The grid search is the most popular ap-
by the leaf node, which begins with a root node. Decision proach for adjusting hyperparameters because of its simplicity
tree regression analyses an object’s attributes and develops a and convenience of use. It is an ignorant search method,
model in the form of a tree to predict future data and produce suggesting that it does not take into account past iterations of
meaningful continuous output. The goal of using decision the search. To identify the optimal combination of hyper pa-
trees is to create a training model that predicts the class or rameters, this approach examines every potential combination
value of a response variable based on fundamental decision in the search space. This approach cannot handle. Expanding
rules learned from previous data (training data). Predicting a hyperparameter search space as it leads to exponential rise in
classifier for a data set begins at the tree’s root. The values of run time and processing.
the root property are compared to the attributes of the record. 2) Random Search: Another method that treats iterations
Based on the comparison, the branch that corresponds to that separately is random search. It examines a random number of
value is followed, and the process continues to the next node. hyperparameter sets rather than searching for all of them in
The decision to divide a tree strategically has a significant the search space. This number is set by the user. Since there
impact on its accuracy. Classification and regression trees are fewer hyperparameter tuning trials, this strategy uses less
have different decision criteria. Decision trees use numerous computing time and runs faster than grid search. Due to the
techniques to determine whether to divide a node into two or fact that the random search considers hyperparameter sets at
more sub-nodes. The node’s purity improves in connection to random, it runs the risk of overlooking the ideal collection of
the data point. The decision tree separates the nodes based on hyper parameters and decreasing model performance.
all process parameters,then finds the split that results in the 3) Bayesian Optimization: Bayesian optimization, in con-
most homogeneous sub-nodes [8]. trast to grid and random search techniques, is an informative
2) Neural Networks: A neural network is a sort of com- search methodology, meaning it learns from previous itera-
putational learning system that uses a network of functions to tions. This technique allows the user to set the number of
interpret and transform data inputs of one form into a desired trials. The Bayes theorem is used as a basis for this strategy.
output, which is usually in another form. The neural network The following equation is used to calculate the probability of
concept was inspired by the human biology and how neurons the hyperparameter value.
in the human brain interact to interpret sensory information.
p(score|hpm) × p(score)
In general, neural network based machine learning algorithms p(score|hpm) = (4)
do not require specific rules specifying what to expect from p(hpm)
the input. Instead, the neural net learning algorithm learns The Bayes theorem in modified form used for hyperparame-
by analysing a large number of labelled examples presented ter tuning is shown in (4), where hpm signifies hyperparameter
during training and using this answer key to determine which and score is the trail of current value. p(score) is the prob-
input traits are required to produce the correct output. After ability of occurrence of score and p(hpm) is probability of
450
2022 22nd National Power Systems Conference (NPSC)
TABLE I
C OMPARATIVE A NALYSIS OF ALL VALIDATION -F ORECASTING M APPING
451
2022 22nd National Power Systems Conference (NPSC)
Fig. 7. Actual V/s Predicted using Hold Out Validation and NN Forecaster Fig. 10. Actual V/s Predicted using Hold Out Validation and GPR Forecaster
Fig. 8. Actual V/s Predicted using K-Fold Validation and NN Forecaster Fig. 11. Actual V/s Predicted using K-Fold Validation and GPR Forecaster
Fig. 9. Actual V/s Predicted using Resub Validation and NN Forecaster Fig. 12. Actual V/s Predicted using Resub Validation and GPR Forecaster
452
2022 22nd National Power Systems Conference (NPSC)
V. C ONCLUSION
• Nine different combinations are formed from three val-
idation and three regression analysis techniques. The
techniques used for validation are Cross, Holdout and
Re-substitution. The methods used for regression analysis
are Decision Tress, Neural Networks and GPR. It is
found that the best combination for required output is
”K-fold validation with Tree Regression”. The statistical
parameters obtained from the combination are RMSE,
R2 , MSE, MAE and training time of 0.077, 0.88, 0.0059,
0.046, 1.2 respectively. Although Tree forecaster with
Re-Substitution validation is giving high accuracy but
453