Professional Documents
Culture Documents
Hyperparameters_Optimization_of_Convolutional_Neur
Hyperparameters_Optimization_of_Convolutional_Neur
1280–1297
DOI: 10.1093/jcde/qwad050
Advance access publication date: 15 June 2023
Research Article
Abstract
Because of the good performance of convolutional neural network (CNN), it has been extensively used in many fields, such as image,
speech, text, etc. However, it is easily affected by hyperparameters. How to effectively configure hyperparameters at a reasonable time
to improve the performance of CNNs has always been a complex problem. To solve this problem, this paper proposes a method to
automatically optimize CNN hyperparameters based on the local autonomous competitive harmony search (LACHS) algorithm. To
avoid the influence of complicated parameter adjustment of LACHS algorithm on its performance, a parameter dynamic adjustment
strategy is adopted, which makes the pitch adjustment probability PAR and step factor BW dynamically adjust according to the actual
situation. To strengthen the fine search of neighborhood space and reduce the possibility of falling into local optima for a long time,
an autonomous decision-making search strategy based on the optimal state is designed. To help the algorithm jump out of the local
fitting situation, this paper proposes a local competition mechanism to make the new sound competes with the worst harmonic
progression of local selection. In addition, an evaluation function is proposed, which integrates the training times and recognition
accuracy. To achieve the purpose of saving the calculation cost without affecting the search result, it makes the training time for each
model depending on the learning rate and batch size. In order to prove the feasibility of LACHS algorithm in configuring CNN superpa-
rameters, the classification of the Fashion-MNIST dataset and CIFAR10 dataset is tested. The comparison is made between CNN based
on empirical configuration and CNN based on classical algorithms to optimize hyperparameters automatically. The results show that
the performance of CNN based on the LACHS algorithm has been improved effectively, so this algorithm has certain advantages in
hyperparametric optimization. In addition, this paper applies the LACHS algorithm to expression recognition. Experiments show that
the performance of CNN optimized based on the LACHS algorithm is better than that of the same type of artificially designed CNN.
Therefore, the method proposed in this paper is feasible in practical application.
Keywords: harmony search algorithm, convolutional neural network, optimization speed, hyperparameters optimization
1. Introduction but also is widely used in speech recognition (Yu et al., 2017),
text recognition (Wang et al., 2016), self-driving (Chen et al., 2021),
Convolutional neural network (CNN), as a representative of
target recognition (Tan & Le, 2019), and other fields. Therefore,
machine learning, is widely used in various fields because of
the optimization of CNN is of great research value.
its advantages in extracting local features of the input data
The traditional direction of optimizing CNN performance is to
(especially input images) by its convolution kernel (Khan et al.,
improve CNN from the aspects of the network structure (He et al.,
2020). Looking back at the development process of CNN, LeCun et
2016; Huang et al., 2016; Khan et al., 2020; Krizhevsky et al., 2012;
al. first proposed the concept of CNN, and built a LeNet-5 model
Simonyan & Zisserman, 2014; Szegedy et al., 2014), parameter ini-
to apply it to image processing (Khan et al., 2020). However, due
tialization (Zhang et al., 2018), loss function (Zhang et al., 2018), and
to the limitation of historical conditions at that time, it did not
optimization algorithm (Zhang et al., 2018) to make it have a bet-
attract much attention. With the development of science and
ter performance. For example, VGGNet (Simonyan & Zisserman,
technology, Krizhevsky et al. (Krizhevsky et al., 2012) proposed
2014), GoogleNet (Szegedy et al., 2014), ResNet (He et al., 2016), and
that the AlexNet model has made a significant breakthrough in
DenseNets (Huang et al., 2016) proposed a series of different CNN
image processing. This has caused an upsurge in studying the
network structures; a series of loss functions (Zhang et al., 2018)
structure of CNN. The following network models, such as VGGNet
are designed for neural networks, such as zero-one loss function,
(Simonyan & Zisserman, 2014), GoogLeNet (Szegedy et al., 2014),
logarithmic loss function, and square loss function mean-square
ResNet (He et al., 2016), and DenseNets (Huang et al., 2016), are
error (MSE). However, the development of CNN’s network struc-
all improved based on the network structure. With the maturity
ture is very mature, so is not easy to improve CNN’s performance
of CNN structure, because of its good network performance, it
by optimizing the network structure. Furthermore, there are many
not only performs well in image recognition (Yan et al., 2015),
kinds of existing loss functions, which have met the needs of
Received: March 21, 2023. Revised: May 25, 2023. Accepted: May 30, 2023
© The Author(s) 2023. Published by Oxford University Press on behalf of the Society for Computational Design and Engineering. This is an Open Access article
distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse,
distribution, and reproduction in any medium, provided the original work is properly cited.
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1281
neural networks for different situations. With the development a GA based on block enhancement (Raymond & Beng, 2007) to
of the CNN network structure and loss function, the problem of build CNN architecture and improve the network performance
parameter initialization of CNN is more and more worthy of at- automatically. Furthermore, Yang et al. proposed using a multi-
tention. It is not only because of the sensitivity of CNN to hyper- objective GA to obtain more precise and smaller CNN (Karpathy,
parameters, e.g., the size of the convolution kernel can affect the 2016). Although the GA has achieved good results in hyperparam-
effect of extracting image features by CNN, but also because the eters optimization, it cannot avoid the problems of slow search
structure of CNN networks tends to widen and deepen, and the speed and high time cost due to the inability to use the feedback
variety of loss functions makes the problem of parameter initial- information on the network timely. In addition, the optimization
ization more complicated. Predecessors call the parameters to be of GAs depends to a certain extent on the initialization of the pop-
initialized as hyperparameters (Larochelle et al., 2007), and the pa- ulation, which cannot guarantee the effectiveness of each opti-
rameter initialization problem is a superparameter optimization mization. In addition to GA, other powerful EC algorithms can also
problem. Most of the efficient CNN models are adjusted manu- be used, such as particle swarm optimization, which has been ap-
ally. Still, it wastes a lot of time and computational cost. Thus, plied to CNN hyperparameters optimization many times. For ex-
it is challenging to meet the needs of increasingly complex CNN. ample, Guo et al. proposed a distributed particle swarm optimiza-
Therefore, how to quickly design a set of corresponding superpa- tion method to improve the efficiency of CNN hyperparameters
generated by randomly selecting one in HM (generated according BW(t) is the local amplitude modulation of the t generation, BWmin
to formula 1). The fine-tuning method is as follows: is the minimum amplitude modulation, BWmax is the maximum
amplitude modulation, t is the current iteration number, and NI
xnew ( j ) = xnew ( j ) ± rand (0, 1 ) × BW (3) is the total iteration number.
1: Enter the number of iterations 1 t1 , the number of iterations 2 t2 , and the maximum number of iterations tmax .
2: Define variables x; x = (x1 ,x2 ,x3 ,…,xn ) and their ranges.
3: The fitness value function fitness(t) = f(x) is defined.
4: Input initialization variables and calculate the corresponding fitness values, and set t = 0, q1 = 0, and q2 = 0.
5: while t < tmax − 1 do
6: if f(x)max is not updated then
7: q1 = q1 + 1, q2 = q2 + 1, t = t + 1
8: if q1 = = t1 then
9: Take the variable xi with local optimal solution for fine tuning, and q1 = 0.
10: end if
11: if q2 = = t2 then
12: Take the variable xi with the local worst solution for fine tuning, and q2 = 0.
13: end if
Table 3: Pseudo-code of local competition update strategy. Table 4: Training times T for different learning rates of the same
network model to enter fitting on CIFAR10 dataset.
Algorithm 3 Pseudo-code of local competitive update strategy
Learning rate 0.001 0.0015 0.002 0.0025
1: if f(Xnew) > f(Xlw) then
2: f(Xlw) = f(Xnew) Training times T of network entering fitting 7 6 5 4
3: Xlw = Xnew
4: end if
network model entering the fitting are related to the learning rate
f (Xlw ) = max f (Xnew ) , f (Xlw ) (6)
and batch size, as shown in Tables 4 and 5.
Through the experiment, it is found that the higher the learn-
f(Xlw ) is the fitness value of the updated local worst harmony, Xlw ing rate, the smaller the training times of network model fitting.
is the harmony corresponding to f(Xlw ), f(Xnew ) the fitness value of The higher the learning rate, the greater the training times of net-
the new harmony, and f(Xlw ) the fitness value of the local worst work model fitting. The bigger the batch is, the more training times
harmony. The pseudo-code of local competitive update strategy the network model enters the fitting; the smaller the batch, the
is shown in Table 3. smaller the training times of the network model into the fitting.
The relationship between the training times of the network model,
3.4. Evaluation method of fusing training times the learning rate, and the batch size is summarized through the
and recognition accuracy experimental rules as shown in formula (7):
Predecessors usually take the accuracy of the network model after
a certain number of training times T in the test set as the perfor-
l rmax − l rmin bath 10000
mance index to evaluate a network model. However, determining T = int ∗a+ ∗ b + lr ∗ ∗c
lr bat hmax − bat hmin bat h
the training times T is the critical factor affecting the subsequent
workload and accuracy. If T is too large, the following calculation (7)
cost will increase exponentially; if T is too small, it is impossible to
objectively evaluate the performance of the network model. More- Here, T is the training times of the current network model, the
over, once the training times T is initialized, they will not change maximum learning rate within the parameter range, the mini-
in the iterative process, which makes it challenging to meet the mum learning rate within the parameter range, the learning rate
needs of the search process, and makes the algorithm search blind of the current network model, the maximum size within the pa-
and inefficient. Predecessors usually determine the training times rameter range, the minimum size within the parameter range, and
T by the training times of the network model entering the fitting. the size of the current network model. Here, a, b, and c are coeffi-
Through experiments, it is found that the training times of the cients adjusted according to the complexity of datasets.
1286 | Hyperparameters Optimization of CNN by Harmony Search Algorithm
3.5. LACHS framework and pseudo-code which makes it more difficult for CNN to identify them. Finally,
The overall flow chart of LACHS algorithm is shown in Fig. 3: the CIFAR-10 dataset is a daily product in the real world, which
The steps of LACHS algorithm are basically the same as those of is relatively irregular, resulting in increased difficulty in recogni-
HS algorithm, with the main difference being improvisation. The tion. Still, at the same time it is more representative in machine
specific process of LACHS algorithm is as follows: learning.
Step 1: Initialize the relevant variables of the algorithm and op- In the experiment, the accuracy of CNN on the test set after a
timization problem. certain number of training times t is taken as the fitness value of
Step 2: Initialize the sound memory library, and take the ac- the LACHS algorithm. For the Fashion-MNIST dataset, the coeffi-
curacy of the network model with the training times T generated cients a, b, and c in formula (6) are set to 1, 0.5, and 1, respectively.
according to formula (7) in the test set as the fitness value. For the CIFAR-10 dataset, the coefficients a, b, and c in formula (6)
Step 3: Update the pitch adjustment probability PAR and step are set to 2, 1, and 2, respectively.
factor BW through the dynamic adjustment strategy for parame-
ters, and judge whether fine adjustment is needed.
Step 4: If fine-tuning is needed, the harmonic progression to be 4.2. Compare the developed methods to the
fine-tuned is selected by the self-decision search strategy based
most advanced ones
Table 7: Parameters of network model structure (the range of i is the optimization advantages of the improved HS algorithm pro-
1–4). posed in this paper. To more intuitively understand the benefits
of the LACHS algorithm in hyperparametric optimization, except
Hyperparameters Range
RSCNN and BASCNN, the CNN built based on other algorithms to
Convolution layer number [3, 4] optimize hyperparameters is optimized based on the same initial
Number of layers of convolution block i [2, 3, 4] population and the same parameter range. Because of the algo-
Convolution kernel i size [3, 4, 5] rithm characteristics of RSCNN and BASCNN, there is no need to
Number of filters 1 [16, 32, 64, 96] initialize the population, so these two CNN are optimized based
Number of filters 2 [48, 64, 96, 128] on the same parameter range.
Number of filters 3 [64, 96, 128]
Number of filters 4 [96, 128]
Activate function 1 [“relu”, “elu”]
4.3. Algorithm settings
Activate function 2 [“relu”, “elu”] In the experiment, the algorithm in this paper uses VGGNet as the
Hidden layer 1 [60, 100, 125] basic network for experimental research, and the range of net-
Hidden layer 2 [60, 100, 125] work parameters to be optimized is shown in Tables 6 and 7, with
the total optimization parameter Leng of 20.
About the parameters of the improved HS algorithm, the size
Table 8: Optimization parameter range of network model. of the harmony library HMS is set to 10, the maximum creation
times Tmax is set to 30, the memory value probability HMCR is set
Hyperparameters Range to 0.8, the minimum adjustment probability PARmin is set to 0.1,
the maximum adjustment probability PARmax is set to 1, the min-
Learning rate [0.001, 0.003, 0.01, 0.03] imum amplitude modulation BWmin is set to 1, and BWmax is set
Batch size [32, 64, 128, 256]
to Leng-1 for the maximum amplitude modulation.
Momentum [0.9, 0.95, 0.99]
The traditional data expansion method (Qi et al., 2022) is
adapted to process the dataset, in which the rotation angle range
is 10, the width offset is 0.1, the height offset is 0.1, the perspec-
hyperparameters include: CNN based on the standard HS (Geem tive transformation range is 0.1, and the zoom range is 0.1, and the
et al., 2001) algorithm (HSCNN), CNN based on the IHS (Mahdavi et horizontal inversion is carried out. The filling mode is the nearest,
al., 2007) algorithm (IHSCNN), and CNN based on the GHS (Omran and the rest is set by default.
& Mahdavi, 2008) algorithm (GHSCNN). For the parameter combination optimized by the algorithm,
Because these optimization methods based on intelligent algo- the training times are set to 50 in the training of Fashion-MNIST
rithms have different characteristics, they are ideal for evaluating dataset and 100 in the training of the CIFAR-10 dataset.
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1289
4.4. Experimental results 100 ‘,’ elu ‘,’ elu ‘,’ 3 ‘,’ 3 ‘,’ 4 ‘,’ 2 ‘,’ 3 ‘,’ 0.001 ‘,’ 0.95 ‘,’ 32 ’]. The ac-
In the experiment, the optimization process and the final opti- curacy rate of LACHSCNN after training after data enhancement
mized population of the LACHS algorithm on the Fashion-MNIST is 90.25%. The CNN model training process and confusion matrix
dataset are shown in Fig. 6. are shown in Fig. 9.
In the Fashion-MNIST dataset, the superparameter combina- It can be concluded from Fig. 9 that the accuracy rate of LACH-
tion of CNN (LACHS) based on the LACHS algorithm optimization SCNN after training is basically stable at about 90%. In the classi-
is [‘64 ‘,’ 3 ‘,’ 128 ‘,’ 5 ‘,’ 96 ‘,’ 4 ‘,’ 128 ‘,’ 4 ‘,’ 60 ‘,’ 100 ‘,’ elu ‘,’ elu ‘,’ 2 fication of CIFAR10 dataset, the recognition effect of tag 3 and tag
‘,’ 3 ‘,’ 3 ‘,’ 4 ‘,’ 0.001 ‘,’ 0.99 ‘,’ 64 ’]. The accuracy rate of LACHSCNN 5 is relatively poor, while the recognition effect of other tag types
after data enhancement and training is 93.34%. The CNN model is good.
training process and confusion matrix are shown in Fig. 7.
It can be concluded from Fig. 7 that the accuracy rate of LACH- 4.5. Compared with the most advanced methods
SCNN after training is basically stable at about 93%. In the clas- First of all, as shown in Table 9, as a network architecture of
sification of Fashion-MNIST dataset, the recognition effect of tag the same type, the accuracy of VGGNet16 on the Fashion-MNIST
6 is relatively poor, and the recognition effect of other tag types is dataset is 92.86%, and that on the CIFAR10 dataset is 88.74%.
good. In contrast, the accuracy of LACHSCNN with the same basic ar-
In the experiment, the optimization process and the final opti- chitecture as VGG is 93.34% on the Fashion-MNIST dataset and
mized population of the LACHS algorithm on the CIFAR10 dataset 90.25% on the CIFAR10 dataset. Regarding classification accuracy,
are shown in Fig. 8. LACHSCNN has improved by 0.48% and 1.51% in the Fashion-
In the CIFAR10 dataset, the superparameter combination of MNIST dataset and CIFAR10 dataset, respectively. Therefore, com-
LACHSCNN is [‘96 ‘,’ 4 ‘,’ 128 ‘,’ 3 ‘,’ 128 ‘,’ 4 ‘,’ 128 ‘,’ 5 ‘,’ 60 ‘,’ pared with the same type of artificially designed CNN, the
1290 | Hyperparameters Optimization of CNN by Harmony Search Algorithm
Figure 8: Optimization process and final optimization population of LACHS algorithm on CIFAR10 dataset.
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1291
performance of CNN based on LACHS optimization has more ad- MNIST dataset is relatively simple. There is no gap in the ex-
vantages. Compared with other kinds of artificially designed CNN, perimental results. On that more complicated CIFAR10 dataset,
LACHSCNN has more benefits than Maxout (Goodfellow et al., the accuracy of LACHSCNN is 6.75%, 0.91%, 1.44%, 1.44%, and
2013), Deeply-supervised (Lee et al., 2015), and Network in Network 1.44% higher than that of RSCNN, BASCNN, GACNN, PSOCNN,
(Lin et al., 2013) on CIFAR10 datasets, which are increased by 1.93%, and DECNN, respectively. The experimental results show that the
0.03%, and 0.65% respectively. Although ALL-CNN (Springenberg LACHS algorithm has more advantages in superparameter opti-
et al., 2014) performs better on CIFAR10 dataset, it uses a deeper mization than other evolutionary algorithms.
and more complex network structure. Compared with the classifi- EvoCNN (Real et al., 2017), CNN-GA (Aszemi & Dominic, 2019),
cation accuracy, ALL-CNN is 1.75% higher than LACHSCNN. How- and CNN-DPSO (Guo et al., 2020) all refer to previous experimental
ever, compared with the resulting calculation cost, LACHSCNN’s results. On the Fashion-MNIST dataset, LACHSCNN is 0.62% and
calculation cost is lower. Therefore, compared with other types of 0.43% higher than EvoCNN (Real et al., 2017) and CNN-DPSO (Guo
artificially designed CNN, CNN based on LACHS optimization has et al., 2020), respectively. On the Fashion-MNIST dataset, LACH-
more advantages regarding comprehensive performance and cal- SCNN is 9.63% more accurate than CNN-GA. Although the perfor-
culation cost. mance of LACHSCNN is better than that of EvoCNN (Real et al.,
Secondly, the results of RSCNN, BASCNN, GACNN, PSOCNN, and 2017), CNN-GA (Aszemi & Dominic, 2019), and CNN-DPSO (Guo et
DECNN are all optimized based on the same initial population and al., 2020), due to the different optimization parameter ranges and
hyperparametric range. As shown in the experimental results in basic architecture, it cannot be explained that the previous meth-
Table 10, the accuracy of RSCNN, BASCNN, GACNN, PSOCNN, and ods are not excellent. Still, it can only prove that the optimization
DECNN on the Fashion-MNIST dataset reached 92.94%, 93.09%, of neural network architecture by LACHS has particular frontier.
93.09%, 93.05%, and 93.26% respectively. The accuracy of RSCNN, HSCNN, IHSCNN, and GHSCNN are all optimized based on the
BASCNN, GACNN, PSOCNN, and DECNN on the CIFAR10 dataset same initial population and hyperparametric range. The exper-
reached 83%, 89.36%, 88.81%, 88.81%, and 88.81%, respectively. It imental results in Table 11 show that the accuracy of HSCNN,
can be seen that the network model based on other types of evo- IHSCNN, and GHSCNN on the Fashion-MNIST dataset is 92.96%,
lutionary algorithms to optimize hyperparameters can achieve 93.23%, and 93.29% respectively. The accuracy of HSCNN, IHSCNN,
good results on the Fashion-MNIST dataset. Still, the effect on the and GHSCNN on the CIFAR10 dataset reached 88.81%, 88.81%, and
more complex CIFAR10 dataset is not ideal. In contrast, on the 88.81%, respectively. It can be seen that CNN based on different
Fashion-MNIST dataset, LACHSCNN is 0.4%, 0.25%, 0.25%, 0.29%, types of HS algorithm optimization hyperparameters can achieve
and 0.08% higher than RSCNN, BASCNN, GACNN, PSOCNN, and good results in the Fashion-MNIST dataset. Still, the effect on the
DECNN, respectively. Because the data structure of the Fashion- more complex CIFAR10 dataset is not ideal. In contrast, the
1292 | Hyperparameters Optimization of CNN by Harmony Search Algorithm
classification accuracy of the Fashion-MNIST dataset is ing some suitable abnormal combinations can effectively prevent
0.38%,0.11%, and 0.05% higher than that of HSCNN, IHSC- the algorithm from falling into the local optimization dilemma
CNN, and GHSCCNN, respectively. Because the data structure of for a long time. This is also the reason why the PSO algorithm is
Fashion-MNIST dataset is relatively simple, there is no gap in the worse than the DE algorithm and LACHS algorithm. However, be-
experimental results. On that more complex CIFAR10 dataset, the cause the data structure of the Fashion-MNIST dataset is simple,
accuracy of LACHSCNN is 1.44%,1.44%, and 1.44% higher than and the optimal results of various intelligent algorithms are not
that of HSCNN, IHSCNN, and GHSCNN, respectively. The exper- noticeable, a more complex CIFAR10 dataset is selected for the
imental results show that compared with other different types experiment.
of HS algorithms, the LACHS algorithm has more advantages in As can be seen from Fig. 11, on the more complex CIFAR10
superparameter optimization. dataset, the optimal results of other algorithms are not ideal, and
The performance of the intelligent algorithm can be under- the best optimization results of the GA algorithm, DE algorithm,
stood by analyzing the optimization process of the optimization and PSO algorithm are still the best harmony in the original har-
parameters of the intelligent algorithm and the final optimization mony database, which does not play an optimal role. Although
population. The intelligent algorithm is divided into other types of the LACHS algorithm was caught in the dilemma of local fitting
intelligent algorithm optimization and the classic HS algorithm in at first, with the increase of iterations, it used the autonomous
the same type for comparison and discussion. decision-making search strategy based on the optimal state to
The optimization process and final optimization population of help itself jump out of the dilemma in time and find better search
LACHS algorithm and other intelligent algorithms in the Fashion- results. From the analysis of the final optimal population, it is evi-
MNIST dataset and CIFAR10 dataset are shown in Figs 10 and 11. dent that the optimization effect of the LACHS algorithm is better.
On the Fashion-MNIST dataset, various algorithms have After the same number of searches, the last optimized popula-
achieved good results in the optimization process in Fig. 10. Al- tion combinations of the LACHS algorithm all belong to the elite
though the optimal speed of the LACHS algorithm is slightly combination and the population reaches the convergence state.
slower than that of the DE algorithm, the final optimal result is However, the last optimized population of other algorithms does
better than that of the DE algorithm. For GA and PSO algorithms, not reach the convergence state. Therefore, the search speed and
the search speed and search ability of these two algorithms are search ability of the LACHS algorithm are better than other algo-
not as good as the LACHS algorithm. And from the analysis of rithms.
the final optimized population, compared with the initial popu- Therefore, compared with other types of intelligent optimiza-
lation, except the GA algorithm, the last optimized population tion algorithms, the LACHS algorithm has more advantages in
of other algorithms belongs to the elite population. The reason terms of superparameter optimization.
why the final optimized population of the GA algorithm is not The optimization process and last optimized population of the
ideal is related mainly to its elimination strategy and popula- LACHS algorithm and classical HS algorithm on Fashion-MNIST
tion size. GA algorithm chooses the next generation to directly dataset and CIFAR10 dataset are shown in Figs 12 and 13.
replace the previous generation, which easily leads to the inabil- On the Fashion-MNIST dataset, we can see from Fig. 12 that var-
ity to guarantee the population quality of each generation when ious HS algorithms have achieved good results in the optimization
the population size is not large enough. The difference between process. Although compared with other types of HS algorithms,
the PSO algorithm and DE and LACHS algorithm is that the PSO the search speed of the LACHS algorithm is a little slower, but
algorithm has no an abnormal combination (the fitness value of the final search results are better than other algorithms. From
combination is too different from that of the population). Hav- the analysis of the last optimized population, compared with
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1293
Figure 11: Optimization process and final optimization population of LACHS algorithm and intelligent algorithm in CIFAR10 dataset.
Figure 12: Optimization process and final optimization population of various HS algorithms in Fashion-MNIST dataset.
1294 | Hyperparameters Optimization of CNN by Harmony Search Algorithm
the initial population, the final optimized population of various gration and development of many disciplines, such as graphic im-
algorithms belongs to the elite population. However, the HS algo- age processing, artificial intelligence, human-computer interac-
rithm and IHS algorithm do not retain the abnormal combination, tion, and psychology. The related video emotion database is the
which is not conducive to the algorithm jumping out of the local foundation of expression recognition research, so this paper uses
optimization dilemma. Therefore, in terms of search ability, the the SVAEE dataset (Liu et al., 2022) to provide data for emotion re-
GHS algorithm and LACHS algorithm are stronger. However, due search that needs training and testing. The SVAEE dataset consists
to the simple data structure of the Fashion-MNIST dataset, the of videos of four actors in seven emotions, and each video is about
optimal results of various intelligent algorithms cannot be sepa- 3 seconds long. This dataset standardizes 68 key points of the face
rated, so the more complex the CIFAR10 dataset was selected for of the test object in the database. A specific sample example of the
experimental comparison. SAVEE database is shown in Fig. 14 below:
As can be seen from Fig. 13, the optimal results of other al- In the experiment, a video frame was shot every 50 frames, and
gorithms except the LACHS algorithm are not ideal. Finally, their a total of 1957 pictures were obtained, each with a size of 48 × 48.
optimal results are still the best harmony in the original har- Then the generated image dataset is divided into a training set
mony library. These algorithms have not played an optimal role. and a testing set according to the proportion of 90% and 10%. In
Although the local autonomous competition HS algorithm fell the LACHS algorithm, the coefficients a, b, and C in formula (6)
into the dilemma of local fitting at the beginning, with the num- are set to 10, 5, and 10, respectively, and the rest configurations
ber of iterations, it used the autonomous decision-making search are unchanged. Then, the CNN obtained by the LACHS algorithm
strategy based on the optimal state to help itself jump out of is trained at 500 times. Because of the high computational cost,
the dilemma in time and find a better search result. The anal- only the VGG16 network model with the same training time of 500
ysis of the last optimized population shows that the LACHS al- times is used as the control group. The training results are shown
gorithm and GHS algorithm have better optimization effects. Af- in Table 12.
ter the same search times, the combination of the final optimized Table 12 shows that CNN based on LACHS can produce higher
population of the LACHS algorithm and GHS algorithm generally accuracy than VGG16, which verifies the effectiveness of LACHS
belongs to the elite combination to reach the convergence state. algorithm and shows the potential of LACHS in solving practical
In contrast, the HS algorithm and IHS algorithm have not reached applications.
the convergence state. Therefore, the search speed of the LACHS
algorithm and GHS algorithm is better than that the HS algorithm 4.7. Further discussion
and IHS algorithm. Although the GHS algorithm has good search The experiment in Section 4 proves the superiority of the LACHS
speed, its search ability is still not as good as that the LACHS al- algorithm in CNN superparameter optimization. In the experi-
gorithm, and GHS algorithm still fails to jump out of the dilemma ment, we use VGGNet as the basic network of hyperparametric
of local fitting until the end of the search. optimization. The LACHS algorithm can get better results than
Therefore, compared with the classical HS algorithm of the the same type of VGGNet16 and other most advanced CNN. How-
same type, the LACHS algorithm has more advantages in terms ever, the LACHS algorithm also has its limitations. Because the
of superparameter optimization. LACHS algorithm optimizes the given basic network superparam-
In short, considering the difficulty of processing the hyperpara- eters, if the basic network is not suitable for the optimization prob-
metric optimization of a given CNN, the contribution of this paper lem, the improvement effect brought by the algorithm is not ap-
is reasonable. parent. In this paper, there are two reasons for choosing VGGNet
as the basic network architecture. First of all, it is a typical CNN
4.6. Expression recognition case study in deep learning, and it is a representative example of superpa-
To evaluate the performance of LACHS-CNN in practical applica- rameter optimization through this algorithm. Secondly, under the
tion, a case study of expression recognition is conducted in this same effect, its structure is simpler and the calculation cost is
part. Expression recognition can significantly promote the inte- lower. However, choosing the appropriate CNN model to solve the
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1295
Figure 14: Seven different expressions of the same person in SVAEE dataset.
corresponding tasks is also a challenging problem. But this needs Data Availability
further study, which is beyond the scope of this paper. Therefore,
The data used to support the findings of this study are available
the contribution of this paper is reasonable for the hyperparame-
from the corresponding author upon request.
ter optimization problem of a given CNN.
Geem Z. W., Kim J. H., & Loganathan G. V. (2001). A new heuristic mixed-variable hyperparameters optimization in convolutional
optimization algorithm: Harmony search. Simulation, 76(2), 60–68. neural networks. IEEE Transactions on Neural Networks and Learn-
https://doi.org/0037- 5497(2001)l:260:ANHOAH2.0.TX;2- 3. ing Systems, 34, 2338–2352. https://doi.org/10.1109/TNNLS.2021.3
Goodfellow I., Warde-Farley D., Mirza M. Courville A., & Bengio Y. 106399.
(2013). Maxout networks. In Proceedings of the International Confer- Li J.-Y., Zhan Z.-H., Tan K. C., & Zhang J. (2023b). Dual differential
ence on Machine Learning(pp. 1319–1327). PMLR. grouping: A more general decomposition method for large-scale
Guo Y., Li J.-Y., & Zhan Z.-H. (2020). Efficient hyperparameter opti- optimization. IEEE Transactions on Cybernetics, 53, 3624–3638. https:
mization for convolution neural networks in deep learning: A dis- //doi.org/10.1109/TCYB.2022.3158391.
tributed particle swarm optimization approach. Cybernetics and Li J.-Y., Zhan Z.-H., & Zhang J. (2022). Evolutionary com-
Systems, 52(2), 1–22. https://doi.org/10.1080/01969722.2020.1827 putation for expensive optimization: A survey. Machine
797. Intelligence Research, 19(1), 3–23. https://doi.org/10.1007/s116
He K., Zhang X., Ren S., & Sun J. (2016). Deep residual learning for im- 33- 022- 1317- 4.
age recognition. In Proceedings of the 2016 IEEE Conference on Com- Lin M., Chen Q., & Yan S. (2013). Network in network. preprint
puter Vision and Pattern Recognition (CVPR)(pp. 770–778). IEEE. (arXiv:1312.4400).https://doi.org/10.48550/arXiv.1312.4400
Huang G., Liu Z., Van Der Maaten L., & Weinberger K. Q. Liu R., Sisman B., Schuller B., Gao G., & Li H. (2022). Ac-
ence on Computer Vision and Pattern Recognition (CVPR)(pp. 1–9). for large-scale optimization. IEEE Transactions on Cybernetics, 51(3),
IEEE.https://doi.org/10.1109/CVPR.2015.7298594 1175–1188. https://doi.org/10.1109/TCYB.2020.2977956.
Tan M., & Le Q. (2019). EfficientNet: Rethinking model scaling for con- Wu S.-H., Zhan Z.-H., & Zhang J. (2021). SAFE: Scale-adaptive fit-
volutional neural networks. In Proceedings of the International Con- ness evaluation method for expensive optimization problems.
ference on Machine Learning(pp. 6105–6114). PMLR. IEEE Transactions on Evolutionary Computation, 25(3), 478–491. https:
Turky A. M., Abdullah S., & Sabar N. R. (2014). A hybrid harmony //doi.org/10.1109/TEVC.2021.3051608.
search algorithm for solving dynamic optimisation problems. Pro- Xiao H., Rasul K., & Vollgraf R. (2017). Fashion-MNIST: A novel image
cedia Computer Science, 29, 1926–1936. https://doi.org/10.1016/j.pr dataset for benchmarking machine learning algorithms. preprint
ocs.2014.05.177. (arXiv:1708.07747). https://doi.org/10.48550/arXiv.1708.07747
Wang S., Chen L., Xu L., Fan W., Sun J., & Naoi S. (2016). Deep knowl- Yu Z., Chan W., & Jaitly N. (2017). Very deep convolutional networks
edge training and heterogeneous CNN for handwritten Chinese for end-to-end speech recognition. In Proceedings of the 2017 IEEE
text recognition. In Proceedings of the 2016 15th International Con- International Conference on Acoustics, Speech and Signal Processing
ference on Frontiers in Handwriting Recognition (ICFHR)(pp. 84–89). (ICASSP)(pp. 4845–4849). IEEE. https://doi.org/10.1109/ICASSP.201
IEEE. 7.7953077.
Wang Y. Q., Li J. Y., Chen C. H., Zhang J., & Zhan Z. H. (2022a). Scale Zhan Z. H., Shi L., Tan K. C., & Zhang J. (2022a). A survey on evolution-
Received: March 21, 2023. Revised: May 25, 2023. Accepted: May 30, 2023
© The Author(s) 2023. Published by Oxford University Press on behalf of the Society for Computational Design and Engineering. This is an Open Access article distributed
under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and
reproduction in any medium, provided the original work is properly cited.