Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Journal of Computational Design and Engineering, 2023, 10,

1280–1297
DOI: 10.1093/jcde/qwad050
Advance access publication date: 15 June 2023
Research Article

Hyperparameters optimization of convolutional neural


network based on local autonomous competition
harmony search algorithm
Dongmei Liu1 , Haibin Ouyang1,2 , *, Steven Li3 , Chunliang Zhang1 and Zhi-Hui Zhan2
1
School of Mechanical and Electrical Engineering, Guangzhou University, Guangzhou, 510006, China
2
School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


3
Graduate School of Business and Law, RMIT University, Melbourne, 3000, Australia

Correspondence: oyhb1987@163.com

Abstract
Because of the good performance of convolutional neural network (CNN), it has been extensively used in many fields, such as image,
speech, text, etc. However, it is easily affected by hyperparameters. How to effectively configure hyperparameters at a reasonable time
to improve the performance of CNNs has always been a complex problem. To solve this problem, this paper proposes a method to
automatically optimize CNN hyperparameters based on the local autonomous competitive harmony search (LACHS) algorithm. To
avoid the influence of complicated parameter adjustment of LACHS algorithm on its performance, a parameter dynamic adjustment
strategy is adopted, which makes the pitch adjustment probability PAR and step factor BW dynamically adjust according to the actual
situation. To strengthen the fine search of neighborhood space and reduce the possibility of falling into local optima for a long time,
an autonomous decision-making search strategy based on the optimal state is designed. To help the algorithm jump out of the local
fitting situation, this paper proposes a local competition mechanism to make the new sound competes with the worst harmonic
progression of local selection. In addition, an evaluation function is proposed, which integrates the training times and recognition
accuracy. To achieve the purpose of saving the calculation cost without affecting the search result, it makes the training time for each
model depending on the learning rate and batch size. In order to prove the feasibility of LACHS algorithm in configuring CNN superpa-
rameters, the classification of the Fashion-MNIST dataset and CIFAR10 dataset is tested. The comparison is made between CNN based
on empirical configuration and CNN based on classical algorithms to optimize hyperparameters automatically. The results show that
the performance of CNN based on the LACHS algorithm has been improved effectively, so this algorithm has certain advantages in
hyperparametric optimization. In addition, this paper applies the LACHS algorithm to expression recognition. Experiments show that
the performance of CNN optimized based on the LACHS algorithm is better than that of the same type of artificially designed CNN.
Therefore, the method proposed in this paper is feasible in practical application.

Keywords: harmony search algorithm, convolutional neural network, optimization speed, hyperparameters optimization

1. Introduction but also is widely used in speech recognition (Yu et al., 2017),
text recognition (Wang et al., 2016), self-driving (Chen et al., 2021),
Convolutional neural network (CNN), as a representative of
target recognition (Tan & Le, 2019), and other fields. Therefore,
machine learning, is widely used in various fields because of
the optimization of CNN is of great research value.
its advantages in extracting local features of the input data
The traditional direction of optimizing CNN performance is to
(especially input images) by its convolution kernel (Khan et al.,
improve CNN from the aspects of the network structure (He et al.,
2020). Looking back at the development process of CNN, LeCun et
2016; Huang et al., 2016; Khan et al., 2020; Krizhevsky et al., 2012;
al. first proposed the concept of CNN, and built a LeNet-5 model
Simonyan & Zisserman, 2014; Szegedy et al., 2014), parameter ini-
to apply it to image processing (Khan et al., 2020). However, due
tialization (Zhang et al., 2018), loss function (Zhang et al., 2018), and
to the limitation of historical conditions at that time, it did not
optimization algorithm (Zhang et al., 2018) to make it have a bet-
attract much attention. With the development of science and
ter performance. For example, VGGNet (Simonyan & Zisserman,
technology, Krizhevsky et al. (Krizhevsky et al., 2012) proposed
2014), GoogleNet (Szegedy et al., 2014), ResNet (He et al., 2016), and
that the AlexNet model has made a significant breakthrough in
DenseNets (Huang et al., 2016) proposed a series of different CNN
image processing. This has caused an upsurge in studying the
network structures; a series of loss functions (Zhang et al., 2018)
structure of CNN. The following network models, such as VGGNet
are designed for neural networks, such as zero-one loss function,
(Simonyan & Zisserman, 2014), GoogLeNet (Szegedy et al., 2014),
logarithmic loss function, and square loss function mean-square
ResNet (He et al., 2016), and DenseNets (Huang et al., 2016), are
error (MSE). However, the development of CNN’s network struc-
all improved based on the network structure. With the maturity
ture is very mature, so is not easy to improve CNN’s performance
of CNN structure, because of its good network performance, it
by optimizing the network structure. Furthermore, there are many
not only performs well in image recognition (Yan et al., 2015),
kinds of existing loss functions, which have met the needs of

Received: March 21, 2023. Revised: May 25, 2023. Accepted: May 30, 2023
© The Author(s) 2023. Published by Oxford University Press on behalf of the Society for Computational Design and Engineering. This is an Open Access article
distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse,
distribution, and reproduction in any medium, provided the original work is properly cited.
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1281

neural networks for different situations. With the development a GA based on block enhancement (Raymond & Beng, 2007) to
of the CNN network structure and loss function, the problem of build CNN architecture and improve the network performance
parameter initialization of CNN is more and more worthy of at- automatically. Furthermore, Yang et al. proposed using a multi-
tention. It is not only because of the sensitivity of CNN to hyper- objective GA to obtain more precise and smaller CNN (Karpathy,
parameters, e.g., the size of the convolution kernel can affect the 2016). Although the GA has achieved good results in hyperparam-
effect of extracting image features by CNN, but also because the eters optimization, it cannot avoid the problems of slow search
structure of CNN networks tends to widen and deepen, and the speed and high time cost due to the inability to use the feedback
variety of loss functions makes the problem of parameter initial- information on the network timely. In addition, the optimization
ization more complicated. Predecessors call the parameters to be of GAs depends to a certain extent on the initialization of the pop-
initialized as hyperparameters (Larochelle et al., 2007), and the pa- ulation, which cannot guarantee the effectiveness of each opti-
rameter initialization problem is a superparameter optimization mization. In addition to GA, other powerful EC algorithms can also
problem. Most of the efficient CNN models are adjusted manu- be used, such as particle swarm optimization, which has been ap-
ally. Still, it wastes a lot of time and computational cost. Thus, plied to CNN hyperparameters optimization many times. For ex-
it is challenging to meet the needs of increasingly complex CNN. ample, Guo et al. proposed a distributed particle swarm optimiza-
Therefore, how to quickly design a set of corresponding superpa- tion method to improve the efficiency of CNN hyperparameters

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


rameter combinations of CNN that are suitable for solving specific optimization (Guo et al., 2020). A two-stage variable length particle
problems is still a challenging problem. swarm optimization method (Huang et al., 2022) is used to search
With the progress of science and technology, it is feasible to the microstructure and macrostructure of the neural network. To
optimize the hyperparameters (Feurer & Hutter, 2019) automat- solve the problem of hyperparameter optimal and high comput-
ically. The so-called automatic optimization of hyperparameters ing cost, Wang et al. (2022a) proposed a particle swarm optimiza-
is to find the best combination of CNN hyperparameters as an tion method based on lightweight scale adaptive fitness evalua-
optimization problem, and then use intelligent algorithms to op- tion (SAFE). In addition, the differential evolution algorithm (Awad
timize. Good results have been achieved in this respect (Bergstra et al., 2020), distribution estimation algorithm (Li et al., 2023a), and
& Bengio, 2012; Kandasamy et al., 2018; Zoph & Le, 2016), such as their combinations also have been applied to CNN hyperparam-
random search algorithm (Bergstra & Bengio, 2012), grid search eters optimization many times. However, just like GAs, they all
algorithm, reinforcement learning (Zoph & Le, 2016), Bayesian op- face the dilemma of local fitting and slow convergence speed (Jian
timization (Kandasamy et al., 2018), and evolutionary comput- et al., 2020). In addition, expensive optimization problems (Zhan
ing (EC)-based methods. At present, the commonly used methods et al., 2022b) are inevitable when combining deep learning with
for optimizing hyperparameters automatically have their own de- EC algorithms. Previous researchers have rich experience solving
fects. For example, the grid search algorithm makes full use of the expensive optimization problems (Li et al., 2022; Lu et al., 2020; Sug-
advantages of parallel computing by searching the value of each anuma et al., 2020; Sun et al., 2019; Wang et al., 2021). For example,
hyperparameter set in a specific range, which makes the optimiza- Wu et al., (2021) proposed a novel SAFE method to address the ex-
tion very fast. However, the characteristics of parallel computing pense optimization problems. Li et al. (2020) presented to solve the
lead to a situation where if one task fails, other tasks will also fail expensive optimization problems by building a surrogate model.
accordingly. And the computational complexity will increase as The difficulties faced by predecessors in optimizing hyperpa-
the number of hyperparameters to be optimized increases. There- rameters can be summarized as two points. Firstly, the algorithm
fore, the grid search method is not suitable for the situation that is easy to fall into the dilemma of local fitting in the process of op-
a large number of hyperparameters need to be optimized. The timizing CNN hyperparameters. Because the search space corre-
random search algorithm (Bergstra et al., 2011) makes it faster sponding to configuring CNN’s superparameters can be very vast,
than grid search by randomly sampling the search range. However, it is difficult for the algorithm to thoroughly search the whole
due to the randomness of the algorithm, the accuracy of the re- space. In this case, the algorithm is prone to the dilemma of
sults cannot be guaranteed. Therefore, the random search method slow convergence speed and falling into local optima. Secondly,
is unsuitable for hyperparameters optimization with high preci- it is difficult to determine the evaluating indicator used to ap-
sion requirements. EC-based approach (Zhan et al., 2022a) imitates praise CNN’s performance. The previous evaluation of CNN’s per-
the process of how a population learns to adapt to the environ- formance index often uses the accuracy of the CNN model in the
ment and optimize species. Therefore, it has natural advantages test set after a certain number of trainings. Therefore, the training
in solving large-scale optimization problems. The EC-based meth- times have a great influence on the performance of the CNN. If the
ods have made good progress in solving CNN hyperparameters CNN model is trained too few times, it will lead to the performance
optimization problems (Aszemi & Dominic, 2019; Li et al., 2023a; index of evaluating network model is not representative. Thus, the
Wang et al., 2022a). For example, Real et al. proposed a large-scale accuracy of algorithm optimization decreases. If the CNN model is
neural evolutionary algorithm. Find the best CNN model by op- trained too many times, the computational complexity of evaluat-
timizing the network structure (Real et al., 2017). Fernandes and ing CNN performance will increase, leading to extensive optimiza-
Yen (2021) proposed a multi-objective evolutionary strategy algo- tion problems. Therefore, this paper proposes a local autonomous
rithm to optimize the structure of deep CNNs. However, these EC- competitive harmony search (LACHS) algorithm to solve these two
based methods still have slow convergence speed and are prone to problems in the process of CNN hyperparameter optimization.
falling into local optima in the face of enormous search space (Jian The main contributions of the proposed LACHS algorithm are
et al., 2021; Li et al., 2023b; Wang et al., 2020, 2022b). As the represen- as follows:
tative of EC algorithms for superparameter optimization, the ge-
netic algorithm (GA) has been applied to CNN superparameter op- (i) From the perspective of algorithm parameter tuning, a dy-
timization many times. For example, Aszemi and Dominic (2019) namic adjustment strategy is adopted to dynamically ad-
proposed using the GA to optimize CNN hyperparameters. Taking just the key parameters, pitch adjustment probability PAR,
advantage of the unique advantages of network blocks in ResNet and step factor BW of HS algorithm with the number of
and DenseNet in feature extraction, Raymond and Beng proposed iterations. This strategy improves the adaptability of the
1282 | Hyperparameters Optimization of CNN by Harmony Search Algorithm

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


Figure 1: The main principle and structure diagram of CNN.

algorithm to different optimization problems and avoids 2. Foundation Knowledge


complex parameter adjustment.
This section mainly introduces related basic knowledge, including
(ii) From the perspective of CNN hyperparameters optimiza-
CNN and harmony search (HS) algorithm. The following will be
tion, an automatic decision-making search strategy based
introduced in detail.
on the optimal state. This strategy selects the search strat-
egy independently through the update of the optimal har-
mony, which enhances the search precision of the algo-
2.1. Convolutional neural network
rithm in various fields. And this strategy improves the abil- CNNs are a kind of feed-forward neural network with convolution
ity of the algorithm to jump out of the local fitting. In addi- calculation and deep structure, which is mainly composed of an
tion, a local competition mechanism is designed to make input layer, convolution layer (CONV), pooling layer (POOL), full
the newly generated harmony compete with the worst connection layer (FC), and the output layer. When the INPUT data
harmonic progression of local selection. This strategy im- enter a simple CNN, the main flow is as follows: first, the input
proves the ability of the algorithm to jump out of local fit- layer reads the input data and keeps the original structure of the
ting. At the same time, it avoids the slow convergence speed input data; then, enter the CONV, CONV is used to extract local
of the algorithm. features; and then the negative data value is converted to 0 by
(iii) From the perspective of solving expensive optimization the linear rectification layer (RELU). Then enter the POOL layer
problems, this paper designs an evaluation function that (POOL) to reduce the eigenvector of the CONV to prevent over-
fuses the training times and recognition accuracy. This fitting; finally, we enter the FC and map the learned “distributed
strategy makes the training time of each model change feature representation” to the sample mark space to realize data
with the learning rate and batch size. This can avoid the ex- classification. The main principle and structure of CNN are shown
pensive optimization problem without affecting the search in Fig. 1.
results.
(iv) According to the experimental results, two classic image However, the structure of all CNN is not the same as that of
classification datasets are used: Fashion-MNIST dataset Fig. 1, such as changing the number of CONV, changing the size of
(Xiao et al., 2017) and CIFAR 10 dataset (Doon et al., 2018). the convolution kernel of each CONV, and selecting different pool-
In the experiment, the way in this paper is compared with ing methods for layering layers. Therefore, many kinds of CNNs
CNN based on empirical configuration and CNN based are derived, such as LeNet (Khan et al., 2020), AlexNet (Krizhevsky
on classical intelligent algorithm automatic configuration. et al., 2012), VG GNet (Simonyan & Zisserman, 2014), GoogLeNet
The results show that the way proposed in this paper has (Szegedy et al., 2014), ResNet (He et al., 2016), and DenseNets
the highest competitive performance under low computa- (Huang et al., 2016). As the basic network, VGGNet has excellent
tion. In addition, the way proposed in this paper is applied classification performance. And the predecessors have rich expe-
to expression recognition, and the experiment proves that rience in the research of VGGNet. Therefore, it is more beneficial
the method proposed in this paper is feasible in practical to choose VGGNet as the basic network. Because there are many
application. kinds of hyperparameters, it takes a lot of manpower and time to
choose a set of suitable hyperparameters. LACHS algorithm pro-
The rest of the paper consists of the following: The basic princi- posed in this paper is used to solve this problem.
ples of CNN and HS algorithm will be briefly introduced in Section
2. the LACHS algorithm will be introduced in Section 3. Section 2.2. HS algorithm
4 mainly introduces the relevant experimental research to prove Different from other algorithms, HS algorithm is a meta-heuristic
the effectiveness of LACHS algorithm. Section 5 summarizes the search algorithm that simulates the principle of band harmony
work done in this paper and puts forward the future work direc- in music performance. This makes the HS algorithm have strong
tion. parallel and global search capabilities (Geem et al., 2001). There are
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1283

the new solution. And so on until the termination condition is


met.
According to the literature (Geem et al., 2001), it is concluded
that the flow of the HS algorithm can be divided into the following
six steps:
Step 1: Initialize the related variables of the algorithm. The
parameters include harmony memory size HMS, memory value
probability HMCR, pitch adjustment probability PAR, step factor
BW, and maximum creation times Tmax.
Step 2: In the solution space determined by the algorithm, there
are n musical instruments when there are n variables. Let the up-
per limit of musical instrument x(j) be U(j), let the lower limit of
the instrument x(j) be L(j), and [L(j), U(j)] is the playable area of
musical instrument x(j). The combination of the playable areas of
all instruments is the solution space of the algorithm.

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


Step 3: Initialize the harmony memory bank, the harmony
memory bank HM consists of HMS harmonies, and Xi = {xi(1), xi(2),
Figure 2: Initialize the operation diagram of the sound memory library. …, xi(D)} represents the ith harmony, which is obtained by the fol-
lowing formula:

Xi ( j ) = L ( j ) + rand (0, 1 ) × (U ( j ) − L ( j ) ) (1)


natural advantages in solving the problems in the process of su-
perparameter optimization, such as a large amount of optimiza- Rand (0,1) is a random number from 0 to 1. Therefore, HMS initial
tion parameters, fast optimization speed, and high precision re- solutions are obtained and stored in the matrix (Fig. 2):
quirements, which is one of the reasons why the HS algorithm is
chosen. ⎡ ⎤
X1
The basic working idea of the HS algorithm is as follows: firstly, ⎢ ⎥
HM = ⎣ . . . ⎦ (2)
HM initial solutions are generated and put into harmony mem-
XHMS
ory. Then, each component of the solution is searched in harmony
memory with probability HMCR, and searched outside memory Step 4: Generate a new harmony.
with a likelihood of 1-HMCR. Finally, expecting to obtain the cor- According to the three rules of HMCR selection, fine-
responding component of the new solution. When searching in tuning and a random selection of new harmony in HM, a
the memory, whether the pitch adjustment probability PAR needs new harmony vector xnew is generated: at first, a num-
to be fine-tuned or not, and if so, fine-tuning according to the step ber lr from 0 to 1 is randomly generated, and if 1r is less
factor BW to form a new solution. Otherwise, no fine-tuning is than HMCR, the decision variable xnew(j) is generated
performed. If the new solution is better than the worst solution from the memory. Then, each decision variable xnew(j)
in the memory, replace the worst solution in the memory with is fine-tuned with probability PAR. Otherwise, xnew (j) is

Table 1: Pseudo-code of standard HS algorithm.

Algorithm 1 Harmonic search algorithm pseudo-code

1: Define fitness value function fitness(t) = f(x), x = (x1 , x2 , x3 , …, xn ).n


2: Define the generation range of harmony and the generating function of harmony.
3: Set the algorithm parameters: harmony library size (HMS), harmony library value probability (HMCR), pitch adjustment probability (PAR), step size
factor (BW), and maximum creation times (MAXGEN).
4: Initialize the harmony library.
5: Evaluate the fitness value of the sound library.
6: Take the harmony best with the largest fitness value in the harmony library and set t = 0.
7: while t < MAXGEN − 1 do
8: if random.random() < HMCR then
9: if random.random() < PAR then
10: Take a random harmony inv from the harmony library
11: aa = np.random.randint(0,self.len,size = BW)
12: for i = 0 → BW − 1 do
13: Fine-tuning the variables corresponding to the harmony to obtain a new harmony invx.
14: end for
15: end if
16: else
17: Generate new harmony invx
18: end if
19: The new harmony invx is compared with the harmony invy with the worst solution in the harmony library, and updated if it is better than it.
20: Update the harmony best with the maximum fitness value in the harmony library again.
21: t = t + 1
22: end while
1284 | Hyperparameters Optimization of CNN by Harmony Search Algorithm

generated by randomly selecting one in HM (generated according BW(t) is the local amplitude modulation of the t generation, BWmin
to formula 1). The fine-tuning method is as follows: is the minimum amplitude modulation, BWmax is the maximum
amplitude modulation, t is the current iteration number, and NI
xnew ( j ) = xnew ( j ) ± rand (0, 1 ) × BW (3) is the total iteration number.

Step 5: Update the harmony library. If the new sound is due


to the worst solution, it will be replaced; otherwise, it will not be 3.2. Self-decision search strategy based on the
updated. optimal state
Step 6: Judge whether to terminate, judge whether the current Because the search space corresponding to the superparameter
creation times have reached this maximum number, if not, repeat optimization problem is huge, considering the time and compu-
the process of steps 4–5 until the maximum creation times are tational cost, the random initialization strategy cannot cover the
reached. entire search space evenly. To make the algorithm fully explores
Finally, the pseudo-code of standard HS algorithm is summa- the entire range space, an autonomous decision-making search
rized as shown in Table 1. strategy based on the optimal state is proposed.
In addition, the development of HS algorithm is quite mature, The self-decision search strategy based on the optimal state is

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


such as IHS (Mahdavi et al., 2007), GHS (Omran & Mahdavi, 2008), inspired by GHS (Omran & Mahdavi, 2008) algorithm. The author
SGHS (self-adaptive HS) algorithm (Pan et al., 2010), etc. GSHS found that the global optimal harmonic progression fine-tuning
(Castelli et al., 2014) proposed a geometric selection strategy from often can get better harmony in an ideal state. However, it is easy
the direction of improving the selection strategy; based on the ad- to fall into the dilemma of local optimum by using only the global
justment of the algorithm structure MHSA-EXTR Archive (Turky et optimum harmonic progression fine-tuning. Sometimes it is help-
al., 2014), etc. It can be seen from this that predecessors have pro- ful for the algorithm to jump out of the plight of local optimum
vided rich reference experience in improving HS algorithm, which by adopting the harmony of the non-global optimum solution.
makes it a good choice to choose HS algorithm to solve the super- And we are considering that the updated state of the optimal har-
parameter optimization problem. mony is closely related to the optimum efficiency of the algorithm.
Therefore, this strategy is to select a variety of sampling strategies
according to the optimal harmony update state. The detailed pro-
3. LACHS Algorithm cess of self-selecting search strategy based on update status is as
This section will introduce the local autonomous competition and follows:
HS algorithm in detail, mainly including the strategies involved in If the optimal harmony has not been updated in a short time,
the improvement of the algorithm. In addition, it also introduces a sample is randomly taken from the current population, and
the improvement of the evaluation method for superparameter the harmonic progression with the optimal solution is fine-tuned
optimization and the process, pseudo-code, and steps of the local in the sample. If the optimal harmony has not been updated
autonomous competition and HS algorithm in detail. for a long time, a sample is randomly selected from the cur-
rent population, and the harmonic progression with the worst
3.1. Dynamic adjustment strategy for solution is chosen in the sample for fine-tuning. In other cases,
parameters harmonic progression with the global optimal solution is used
The parameter setting of an algorithm is an important factor af- for fine-tuning. The detailed process pseudo-code of the self-
fecting the performance of the algorithm, and the parameter set- decision search strategy based on the optimal state is shown in
ting is affected by the actual optimization problem. Although the Table 2.
parameters can be set according to the specific optimization prob- This strategy effectively avoids the dilemma that the algorithm
lem, it will cost a lot of time and calculation cost, and once the falls into a local optimum for a long time. And make the algorithm
parameters are initialized, they will not change in the iterative explore the potential area of the whole search space as much as
process, which makes it difficult to meet the needs of the search possible in a limited time. The optimization speed and ability of
process, easy to make the algorithm search blind and inefficient. the final algorithm are improved.
Inspired by IHS (Mahdavi et al., 2007) algorithm, dynamic adjust-
ment strategy for parameters is adopted to set the key parame- 3.3. Local competition update strategy
ters of response algorithm performance, such as pitch adjustment On the competitive update strategy, the HS algorithm chooses the
probability PAR and step factor BW. Their settings are dynamically global competitive update strategy. That is, the new harmony gen-
adjusted with the number of iterations, so as to quickly adapt to erated by each iteration of the HS algorithm is compared with the
the current optimization problem. worst harmonic progression in the harmony library. If the new
In this strategy, the pitch adjustment probability PAR is shown harmony is better, the new harmony replaces the global worst
in formula (4): harmony. Because the worst harmony in the whole population is
 eliminated every time, although it is helpful to the optimization
t
PAR (t ) = PARmin + × (PARmax − PARmin ) (4) efficiency to a certain extent, it also means that once it falls into
NI
the dilemma of local fitting, it is challenging to jump out of the
PAR (t) is the local adjustment probability of the t generation, dilemma. It is also possible to produce the global best harmony
PARmin is the minimum adjustment probability, PARmax is the by fine-tuning the global worst harmony. This is helpful for the
maximum adjustment probability, t is the current iteration num- algorithm to jump out of the dilemma of the local fitting. There-
ber and NI is the total iteration number. fore, a local competitive selection mechanism is established in
In addition, the step factor BW is as follows (5): this paper. That is, a sample is randomly selected from the sound
⎛⎛  ⎞ ⎞ memory in each iteration, and the new sound is compared with
BWmin
ln
⎝⎝ BWmax
NI
⎠×t ⎠ the harmonic progression with the worst solution in the sample,
BW (t ) = BWmax × e (5) and if the new harmony is better, it will be replaced. This strategy
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1285

Table 2: Pseudo-code of self-determination search strategy based on optimal state.

Algorithm 2 Pseudo-code of self-selected search strategy based on updated state

1: Enter the number of iterations 1 t1 , the number of iterations 2 t2 , and the maximum number of iterations tmax .
2: Define variables x; x = (x1 ,x2 ,x3 ,…,xn ) and their ranges.
3: The fitness value function fitness(t) = f(x) is defined.
4: Input initialization variables and calculate the corresponding fitness values, and set t = 0, q1 = 0, and q2 = 0.
5: while t < tmax − 1 do
6: if f(x)max is not updated then
7: q1 = q1 + 1, q2 = q2 + 1, t = t + 1
8: if q1 = = t1 then
9: Take the variable xi with local optimal solution for fine tuning, and q1 = 0.
10: end if
11: if q2 = = t2 then
12: Take the variable xi with the local worst solution for fine tuning, and q2 = 0.
13: end if

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


14: if q1 ! = t1 and q2 ! = t2 then
15: Take the globally optimal variable xi for fine-tuning.
16: end if
17: else
18: Take the globally optimal variable xi for fine-tuning, and q1 , q2 = 0, t = t + 1.
19: end if
20: end while

Table 3: Pseudo-code of local competition update strategy. Table 4: Training times T for different learning rates of the same
network model to enter fitting on CIFAR10 dataset.
Algorithm 3 Pseudo-code of local competitive update strategy
Learning rate 0.001 0.0015 0.002 0.0025
1: if f(Xnew) > f(Xlw) then
2: f(Xlw) = f(Xnew) Training times T of network entering fitting 7 6 5 4
3: Xlw = Xnew
4: end if

Table 5: Training times T of different batch sizes in the same net-


work model when entering the fitting on CIFAR10 dataset.
keeps the possibility of generating the global optimal harmony by
Batch sizes 32 64 96 128
fine-tuning the global worst harmony. It helps the algorithm to
jump out of the dilemma of local fitting to some extent. Training times T of network entering fitting 8 9 10 12
In this strategy, the evaluation standard of competition is
shown in formula (6):


  network model entering the fitting are related to the learning rate
f (Xlw ) = max f (Xnew ) , f (Xlw ) (6)
and batch size, as shown in Tables 4 and 5.
 
Through the experiment, it is found that the higher the learn-
f(Xlw ) is the fitness value of the updated local worst harmony, Xlw ing rate, the smaller the training times of network model fitting.

is the harmony corresponding to f(Xlw ), f(Xnew ) the fitness value of The higher the learning rate, the greater the training times of net-
the new harmony, and f(Xlw ) the fitness value of the local worst work model fitting. The bigger the batch is, the more training times
harmony. The pseudo-code of local competitive update strategy the network model enters the fitting; the smaller the batch, the
is shown in Table 3. smaller the training times of the network model into the fitting.
The relationship between the training times of the network model,
3.4. Evaluation method of fusing training times the learning rate, and the batch size is summarized through the
and recognition accuracy experimental rules as shown in formula (7):
Predecessors usually take the accuracy of the network model after
a certain number of training times T in the test set as the perfor- 
l rmax − l rmin bath 10000
mance index to evaluate a network model. However, determining T = int ∗a+ ∗ b + lr ∗ ∗c
lr bat hmax − bat hmin bat h
the training times T is the critical factor affecting the subsequent
workload and accuracy. If T is too large, the following calculation (7)
cost will increase exponentially; if T is too small, it is impossible to
objectively evaluate the performance of the network model. More- Here, T is the training times of the current network model, the
over, once the training times T is initialized, they will not change maximum learning rate within the parameter range, the mini-
in the iterative process, which makes it challenging to meet the mum learning rate within the parameter range, the learning rate
needs of the search process, and makes the algorithm search blind of the current network model, the maximum size within the pa-
and inefficient. Predecessors usually determine the training times rameter range, the minimum size within the parameter range, and
T by the training times of the network model entering the fitting. the size of the current network model. Here, a, b, and c are coeffi-
Through experiments, it is found that the training times of the cients adjusted according to the complexity of datasets.
1286 | Hyperparameters Optimization of CNN by Harmony Search Algorithm

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


Figure 3: General flow chart of LACHS algorithm.

Table 6: Pseudo-code of local autonomous competition HS algorithm.

Algorithm 4 Pseudo-code of local competition HS algorithm

1: Define fitness value function fitness(t) = f(x), x = (x1 , x2 , x3 , …, xn ).


2: Define the generation range of harmony and the generation function of harmony.
3: Set the algorithm parameters: harmony library size (HMS), harmony library value probability (HMCR), fine tuning probability (PAR), amplitude
modulation (BW), and maximum creation times (MAXGEN).
4: Initialize the harmony library.
5: Evaluate the fitness value of the harmony library.
6: Take the harmony best with the largest fitness value in the harmony library, and set t = 0, q1 = 0, q2 = 0.
7: while t < MAXGEN − 1 do
8: PAR and BW are updated according to formulas (4) and (5)
9: if random.random() < HMCR then
10: if random.random() < PAR then
11: if f(x)max is not updated then
12: q1 = q1 + 1, q2 = q2 + 1
13: if q1 = = t1 then
14: Take the variable xi with the local optimal solution for fine tuning, and q1 = 0.
15: end if
16: if q2 = = t2 then
17: Take the variable xi with the local worst solution for fine tuning, and q2 = 0.
18: end if
19: if q1 ! = t1 and q2 ! = t2 then
20: Take the variable xi with the global optimal solution for fine tuning.
21: end if
22: else
23: Take the variable xi with the global optimal solution for fine tuning, and q1 , q2 = 0
24: end if
25: else
26: Generate new harmony invx
27: end if
28: else
29: Generate new harmony invx
30: The new harmony invx is compared with the harmony invy with the worst solution in the harmony library, and updated if it is better than it.
31: Update the harmony best with the maximum fitness value in the harmony library again.
32: t=t+1
33: end if
34: end while
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1287

3.5. LACHS framework and pseudo-code which makes it more difficult for CNN to identify them. Finally,
The overall flow chart of LACHS algorithm is shown in Fig. 3: the CIFAR-10 dataset is a daily product in the real world, which
The steps of LACHS algorithm are basically the same as those of is relatively irregular, resulting in increased difficulty in recogni-
HS algorithm, with the main difference being improvisation. The tion. Still, at the same time it is more representative in machine
specific process of LACHS algorithm is as follows: learning.
Step 1: Initialize the relevant variables of the algorithm and op- In the experiment, the accuracy of CNN on the test set after a
timization problem. certain number of training times t is taken as the fitness value of
Step 2: Initialize the sound memory library, and take the ac- the LACHS algorithm. For the Fashion-MNIST dataset, the coeffi-
curacy of the network model with the training times T generated cients a, b, and c in formula (6) are set to 1, 0.5, and 1, respectively.
according to formula (7) in the test set as the fitness value. For the CIFAR-10 dataset, the coefficients a, b, and c in formula (6)
Step 3: Update the pitch adjustment probability PAR and step are set to 2, 1, and 2, respectively.
factor BW through the dynamic adjustment strategy for parame-
ters, and judge whether fine adjustment is needed.
Step 4: If fine-tuning is needed, the harmonic progression to be 4.2. Compare the developed methods to the
fine-tuned is selected by the self-decision search strategy based
most advanced ones

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


on the optimal state to generate a new harmony, otherwise, the To show the advantages of the improved HS algorithm in this pa-
new harmony is randomly generated from the solution space. per, this paper compares the classification accuracy of the net-
Step 5: According to the local competition update strategy, work model based on experience and the network model based
judge whether to keep the new harmony. on algorithm automatic optimization hyperparameters from the
Step 6: Judge whether to terminate, judge whether the current test dataset.
creation times have reached this maximum number, if not, repeat The network models based on experience include: ALL-CNN
the process of steps 3–5 until the maximum creation times are (Springenberg et al., 2014), Deeply-supervised (Lee et al., 2015),
reached. Network in Network (Lin et al., 2013), Maxout (Goodfellow et al.,
Finally, the detailed pseudo-code of local autonomous compe- 2013), and VGGNet16. Because these hand-designed CNN are
tition and acoustic search algorithm is shown in Table 6. represented in machine learning, they are suitable for studying
whether the improved HS algorithm in this paper can find a bet-
ter network model than these classical CNN. Considering the cal-
4. Experimental Results and Analysis culated cost of the algorithm, in addition to VGGNet16, this pa-
per directly quotes the best results in their original paper for
In this section, the effectiveness of the algorithm is studied
comparison.
through experiments. The experimental dataset, comparison al-
The network model built based on the automatic optimiza-
gorithm, and parameter setting will be introduced in Sections 4.1–
tion of superparameters of intelligent algorithms can be divided
4.3. Sections 4.4–4.7 mainly compare and analyze the experimen-
into two types in comparison: the optimization based on other
tal results of different algorithms in different datasets to verify
types of intelligent algorithms and the optimization based on
the effectiveness of LACHS algorithm.
the classic HS algorithm in the same type. The steps of the con-
volution neural network combined with superparameter based
4.1. Baseline datasets and evaluation indicators on an intelligence optimization algorithm are summarized as
To evaluate the performance of LACHS algorithm, the popular and follows.
widely used benchmark datasets Fashion-MNIST dataset (Xiao et Step 1: Define the search space: determine the superparameter
al., 2017) and CIFAR10 dataset (Doon et al., 2018) are used as ex- range to be optimized according to Tables 7 and 8.
perimental datasets. Step 2: Initialization group: a group of candidate solutions is
The Fashion-MNIST dataset is created to replace the MNIST created, and each solution represents a solution in the search
dataset. It consists of 70 000 pictures with the size of 48 × 48 × 3, space.
which are divided into ten categories. In this paper, whether the Step 3: Evaluation of fitness value: training CNN determined
algorithm is optimized or the final parameter combination is ob- by particles of intelligent optimization algorithm according to
tained for network training, the Fashion-MNIST dataset is divided the training times determined by formula (7) on the verification
into a training set, a verification set, and a test set at a ratio of set, and the fitness value adopts the accuracy of classification of
5:1:1. The reason why the Fashion-MNIST dataset is selected is trained CNN on the test set.
that the Fashion-MNIST dataset has a lot of noise, which can in- Step 4: Update particle position: use swarm intelligence opti-
crease the difficulty of recognition, and it can represent modern mization algorithm to update the solution in the search space.
machine learning. Step 5: Repeat: repeat step 3–4 until the maximum number of
The CIFAR-10 dataset is composed of ordinary daily items. The iterations is met.
task is to classify a group of pictures with the size of 32 × 32 × 3. Step 6: Test model: evaluate the performance of CNN based on
The CIFAR-10 dataset is composed of 60 000 color pictures and the optimal configuration of the test set.
is divided into 10 categories (aircraft, cars, birds, cats, deer, dogs, Finally, the flowchart of optimizing CNN superparameter based
frogs, horses, boats, and trucks) as shown in Fig. 4, each account- on the intelligent optimization algorithm is shown in Fig. 5.
ing for one-tenth. In this paper, whether the algorithm is opti- The network models based on other types of intelligent al-
mized or the final parameter combination is obtained for network gorithm optimization parameters include: CNN based on ran-
training, the CIFAR-10 dataset is divided into a training set and a dom search algorithm (RSCNN), CNN based on Bayesian op-
test set by a 5:1 ratio. The CIFAR-10 dataset is chosen because it timization (BASCNN), CNN based on DE (DECN), and CNN
is composed of color images, which is more difficult to identify. based on PSO (PSOCNN). The network models based on the
Different types of pictures in CIFAR-10 dataset are very different, more classical HS algorithm of the same type to optimize the
1288 | Hyperparameters Optimization of CNN by Harmony Search Algorithm

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


Figure 4: Ten different classifications of CIFAR-10 and Fashion-MNIST dataset.

Table 7: Parameters of network model structure (the range of i is the optimization advantages of the improved HS algorithm pro-
1–4). posed in this paper. To more intuitively understand the benefits
of the LACHS algorithm in hyperparametric optimization, except
Hyperparameters Range
RSCNN and BASCNN, the CNN built based on other algorithms to
Convolution layer number [3, 4] optimize hyperparameters is optimized based on the same initial
Number of layers of convolution block i [2, 3, 4] population and the same parameter range. Because of the algo-
Convolution kernel i size [3, 4, 5] rithm characteristics of RSCNN and BASCNN, there is no need to
Number of filters 1 [16, 32, 64, 96] initialize the population, so these two CNN are optimized based
Number of filters 2 [48, 64, 96, 128] on the same parameter range.
Number of filters 3 [64, 96, 128]
Number of filters 4 [96, 128]
Activate function 1 [“relu”, “elu”]
4.3. Algorithm settings
Activate function 2 [“relu”, “elu”] In the experiment, the algorithm in this paper uses VGGNet as the
Hidden layer 1 [60, 100, 125] basic network for experimental research, and the range of net-
Hidden layer 2 [60, 100, 125] work parameters to be optimized is shown in Tables 6 and 7, with
the total optimization parameter Leng of 20.
About the parameters of the improved HS algorithm, the size
Table 8: Optimization parameter range of network model. of the harmony library HMS is set to 10, the maximum creation
times Tmax is set to 30, the memory value probability HMCR is set
Hyperparameters Range to 0.8, the minimum adjustment probability PARmin is set to 0.1,
the maximum adjustment probability PARmax is set to 1, the min-
Learning rate [0.001, 0.003, 0.01, 0.03] imum amplitude modulation BWmin is set to 1, and BWmax is set
Batch size [32, 64, 128, 256]
to Leng-1 for the maximum amplitude modulation.
Momentum [0.9, 0.95, 0.99]
The traditional data expansion method (Qi et al., 2022) is
adapted to process the dataset, in which the rotation angle range
is 10, the width offset is 0.1, the height offset is 0.1, the perspec-
hyperparameters include: CNN based on the standard HS (Geem tive transformation range is 0.1, and the zoom range is 0.1, and the
et al., 2001) algorithm (HSCNN), CNN based on the IHS (Mahdavi et horizontal inversion is carried out. The filling mode is the nearest,
al., 2007) algorithm (IHSCNN), and CNN based on the GHS (Omran and the rest is set by default.
& Mahdavi, 2008) algorithm (GHSCNN). For the parameter combination optimized by the algorithm,
Because these optimization methods based on intelligent algo- the training times are set to 50 in the training of Fashion-MNIST
rithms have different characteristics, they are ideal for evaluating dataset and 100 in the training of the CIFAR-10 dataset.
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1289

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


Figure 5: Flow of network model based on intelligent algorithms for automatic optimization of superparameters.

4.4. Experimental results 100 ‘,’ elu ‘,’ elu ‘,’ 3 ‘,’ 3 ‘,’ 4 ‘,’ 2 ‘,’ 3 ‘,’ 0.001 ‘,’ 0.95 ‘,’ 32 ’]. The ac-
In the experiment, the optimization process and the final opti- curacy rate of LACHSCNN after training after data enhancement
mized population of the LACHS algorithm on the Fashion-MNIST is 90.25%. The CNN model training process and confusion matrix
dataset are shown in Fig. 6. are shown in Fig. 9.
In the Fashion-MNIST dataset, the superparameter combina- It can be concluded from Fig. 9 that the accuracy rate of LACH-
tion of CNN (LACHS) based on the LACHS algorithm optimization SCNN after training is basically stable at about 90%. In the classi-
is [‘64 ‘,’ 3 ‘,’ 128 ‘,’ 5 ‘,’ 96 ‘,’ 4 ‘,’ 128 ‘,’ 4 ‘,’ 60 ‘,’ 100 ‘,’ elu ‘,’ elu ‘,’ 2 fication of CIFAR10 dataset, the recognition effect of tag 3 and tag
‘,’ 3 ‘,’ 3 ‘,’ 4 ‘,’ 0.001 ‘,’ 0.99 ‘,’ 64 ’]. The accuracy rate of LACHSCNN 5 is relatively poor, while the recognition effect of other tag types
after data enhancement and training is 93.34%. The CNN model is good.
training process and confusion matrix are shown in Fig. 7.
It can be concluded from Fig. 7 that the accuracy rate of LACH- 4.5. Compared with the most advanced methods
SCNN after training is basically stable at about 93%. In the clas- First of all, as shown in Table 9, as a network architecture of
sification of Fashion-MNIST dataset, the recognition effect of tag the same type, the accuracy of VGGNet16 on the Fashion-MNIST
6 is relatively poor, and the recognition effect of other tag types is dataset is 92.86%, and that on the CIFAR10 dataset is 88.74%.
good. In contrast, the accuracy of LACHSCNN with the same basic ar-
In the experiment, the optimization process and the final opti- chitecture as VGG is 93.34% on the Fashion-MNIST dataset and
mized population of the LACHS algorithm on the CIFAR10 dataset 90.25% on the CIFAR10 dataset. Regarding classification accuracy,
are shown in Fig. 8. LACHSCNN has improved by 0.48% and 1.51% in the Fashion-
In the CIFAR10 dataset, the superparameter combination of MNIST dataset and CIFAR10 dataset, respectively. Therefore, com-
LACHSCNN is [‘96 ‘,’ 4 ‘,’ 128 ‘,’ 3 ‘,’ 128 ‘,’ 4 ‘,’ 128 ‘,’ 5 ‘,’ 60 ‘,’ pared with the same type of artificially designed CNN, the
1290 | Hyperparameters Optimization of CNN by Harmony Search Algorithm

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


Figure 6: Optimization process and final optimization population of LACHS algorithm on Fashion-MNIST dataset.

Figure 7: Training process and confusion matrix of LACHSCNN in Fashion-MNIST.

Figure 8: Optimization process and final optimization population of LACHS algorithm on CIFAR10 dataset.
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1291

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


Figure 9: Training process and confusion matrix of LACHSCNN in CIFAR10 dataset.

Table 9: Comparison with CNN based on experience.

Method Network model Fashion-MNIST CIFAR10

Manually designed CNN VGGNet16 92.86 88.74


Maxout (Goodfellow et al., 2013) – 88.32
ALL-CNN (Springenberg et al., 2014) – 92.00
Deeply supervised (Lee et al., 2015) – 90.22
Network in network (Lin et al., 2013) – 89.60

The network model constructed by this algorithm LACHSCNN 93.34 90.25

performance of CNN based on LACHS optimization has more ad- MNIST dataset is relatively simple. There is no gap in the ex-
vantages. Compared with other kinds of artificially designed CNN, perimental results. On that more complicated CIFAR10 dataset,
LACHSCNN has more benefits than Maxout (Goodfellow et al., the accuracy of LACHSCNN is 6.75%, 0.91%, 1.44%, 1.44%, and
2013), Deeply-supervised (Lee et al., 2015), and Network in Network 1.44% higher than that of RSCNN, BASCNN, GACNN, PSOCNN,
(Lin et al., 2013) on CIFAR10 datasets, which are increased by 1.93%, and DECNN, respectively. The experimental results show that the
0.03%, and 0.65% respectively. Although ALL-CNN (Springenberg LACHS algorithm has more advantages in superparameter opti-
et al., 2014) performs better on CIFAR10 dataset, it uses a deeper mization than other evolutionary algorithms.
and more complex network structure. Compared with the classifi- EvoCNN (Real et al., 2017), CNN-GA (Aszemi & Dominic, 2019),
cation accuracy, ALL-CNN is 1.75% higher than LACHSCNN. How- and CNN-DPSO (Guo et al., 2020) all refer to previous experimental
ever, compared with the resulting calculation cost, LACHSCNN’s results. On the Fashion-MNIST dataset, LACHSCNN is 0.62% and
calculation cost is lower. Therefore, compared with other types of 0.43% higher than EvoCNN (Real et al., 2017) and CNN-DPSO (Guo
artificially designed CNN, CNN based on LACHS optimization has et al., 2020), respectively. On the Fashion-MNIST dataset, LACH-
more advantages regarding comprehensive performance and cal- SCNN is 9.63% more accurate than CNN-GA. Although the perfor-
culation cost. mance of LACHSCNN is better than that of EvoCNN (Real et al.,
Secondly, the results of RSCNN, BASCNN, GACNN, PSOCNN, and 2017), CNN-GA (Aszemi & Dominic, 2019), and CNN-DPSO (Guo et
DECNN are all optimized based on the same initial population and al., 2020), due to the different optimization parameter ranges and
hyperparametric range. As shown in the experimental results in basic architecture, it cannot be explained that the previous meth-
Table 10, the accuracy of RSCNN, BASCNN, GACNN, PSOCNN, and ods are not excellent. Still, it can only prove that the optimization
DECNN on the Fashion-MNIST dataset reached 92.94%, 93.09%, of neural network architecture by LACHS has particular frontier.
93.09%, 93.05%, and 93.26% respectively. The accuracy of RSCNN, HSCNN, IHSCNN, and GHSCNN are all optimized based on the
BASCNN, GACNN, PSOCNN, and DECNN on the CIFAR10 dataset same initial population and hyperparametric range. The exper-
reached 83%, 89.36%, 88.81%, 88.81%, and 88.81%, respectively. It imental results in Table 11 show that the accuracy of HSCNN,
can be seen that the network model based on other types of evo- IHSCNN, and GHSCNN on the Fashion-MNIST dataset is 92.96%,
lutionary algorithms to optimize hyperparameters can achieve 93.23%, and 93.29% respectively. The accuracy of HSCNN, IHSCNN,
good results on the Fashion-MNIST dataset. Still, the effect on the and GHSCNN on the CIFAR10 dataset reached 88.81%, 88.81%, and
more complex CIFAR10 dataset is not ideal. In contrast, on the 88.81%, respectively. It can be seen that CNN based on different
Fashion-MNIST dataset, LACHSCNN is 0.4%, 0.25%, 0.25%, 0.29%, types of HS algorithm optimization hyperparameters can achieve
and 0.08% higher than RSCNN, BASCNN, GACNN, PSOCNN, and good results in the Fashion-MNIST dataset. Still, the effect on the
DECNN, respectively. Because the data structure of the Fashion- more complex CIFAR10 dataset is not ideal. In contrast, the
1292 | Hyperparameters Optimization of CNN by Harmony Search Algorithm

Table 10: Comparison with CNN constructed by intelligent algorithm.

Method Network model Fashion-MNIST CIFAR10

CNN constructed by intelligent algorithm RSCNN 92.94 83.00


BASCNN 93.09 89.36
GACNN 93.09 88.81
PSOCNN 93.05 88.81
DECNN 93.26 88.81
EvoCNN (Real et al., 2017) CNN-GA 92.72 –
(Aszemi & Dominic, 2019) – 80.62
CNN-DPSO (Guo et al., 2020) 92.91 –

The network model constructed by this LACHSCNN 93.34 90.25


algorithm

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


Table 11: Compared with network model constructed by classical HS algorithms.

Method Network model Fashion-MNIST CIFAR10

HSCNN 92.96 88.81


Network model based on classical HS algorithms IHSCNN 93.23 88.81
GHSCNN 93.29 88.81

The network model constructed by this algorithm LACHSCNN 93.34 90.25

classification accuracy of the Fashion-MNIST dataset is ing some suitable abnormal combinations can effectively prevent
0.38%,0.11%, and 0.05% higher than that of HSCNN, IHSC- the algorithm from falling into the local optimization dilemma
CNN, and GHSCCNN, respectively. Because the data structure of for a long time. This is also the reason why the PSO algorithm is
Fashion-MNIST dataset is relatively simple, there is no gap in the worse than the DE algorithm and LACHS algorithm. However, be-
experimental results. On that more complex CIFAR10 dataset, the cause the data structure of the Fashion-MNIST dataset is simple,
accuracy of LACHSCNN is 1.44%,1.44%, and 1.44% higher than and the optimal results of various intelligent algorithms are not
that of HSCNN, IHSCNN, and GHSCNN, respectively. The exper- noticeable, a more complex CIFAR10 dataset is selected for the
imental results show that compared with other different types experiment.
of HS algorithms, the LACHS algorithm has more advantages in As can be seen from Fig. 11, on the more complex CIFAR10
superparameter optimization. dataset, the optimal results of other algorithms are not ideal, and
The performance of the intelligent algorithm can be under- the best optimization results of the GA algorithm, DE algorithm,
stood by analyzing the optimization process of the optimization and PSO algorithm are still the best harmony in the original har-
parameters of the intelligent algorithm and the final optimization mony database, which does not play an optimal role. Although
population. The intelligent algorithm is divided into other types of the LACHS algorithm was caught in the dilemma of local fitting
intelligent algorithm optimization and the classic HS algorithm in at first, with the increase of iterations, it used the autonomous
the same type for comparison and discussion. decision-making search strategy based on the optimal state to
The optimization process and final optimization population of help itself jump out of the dilemma in time and find better search
LACHS algorithm and other intelligent algorithms in the Fashion- results. From the analysis of the final optimal population, it is evi-
MNIST dataset and CIFAR10 dataset are shown in Figs 10 and 11. dent that the optimization effect of the LACHS algorithm is better.
On the Fashion-MNIST dataset, various algorithms have After the same number of searches, the last optimized popula-
achieved good results in the optimization process in Fig. 10. Al- tion combinations of the LACHS algorithm all belong to the elite
though the optimal speed of the LACHS algorithm is slightly combination and the population reaches the convergence state.
slower than that of the DE algorithm, the final optimal result is However, the last optimized population of other algorithms does
better than that of the DE algorithm. For GA and PSO algorithms, not reach the convergence state. Therefore, the search speed and
the search speed and search ability of these two algorithms are search ability of the LACHS algorithm are better than other algo-
not as good as the LACHS algorithm. And from the analysis of rithms.
the final optimized population, compared with the initial popu- Therefore, compared with other types of intelligent optimiza-
lation, except the GA algorithm, the last optimized population tion algorithms, the LACHS algorithm has more advantages in
of other algorithms belongs to the elite population. The reason terms of superparameter optimization.
why the final optimized population of the GA algorithm is not The optimization process and last optimized population of the
ideal is related mainly to its elimination strategy and popula- LACHS algorithm and classical HS algorithm on Fashion-MNIST
tion size. GA algorithm chooses the next generation to directly dataset and CIFAR10 dataset are shown in Figs 12 and 13.
replace the previous generation, which easily leads to the inabil- On the Fashion-MNIST dataset, we can see from Fig. 12 that var-
ity to guarantee the population quality of each generation when ious HS algorithms have achieved good results in the optimization
the population size is not large enough. The difference between process. Although compared with other types of HS algorithms,
the PSO algorithm and DE and LACHS algorithm is that the PSO the search speed of the LACHS algorithm is a little slower, but
algorithm has no an abnormal combination (the fitness value of the final search results are better than other algorithms. From
combination is too different from that of the population). Hav- the analysis of the last optimized population, compared with
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1293

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


Figure 10: Optimization process and final optimization population of LACHS algorithm and intelligent algorithm in Fashion-MNIST dataset.

Figure 11: Optimization process and final optimization population of LACHS algorithm and intelligent algorithm in CIFAR10 dataset.

Figure 12: Optimization process and final optimization population of various HS algorithms in Fashion-MNIST dataset.
1294 | Hyperparameters Optimization of CNN by Harmony Search Algorithm

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


Figure 13: Optimization process and final optimization population of various HS algorithms in the CIFAR10 dataset.

the initial population, the final optimized population of various gration and development of many disciplines, such as graphic im-
algorithms belongs to the elite population. However, the HS algo- age processing, artificial intelligence, human-computer interac-
rithm and IHS algorithm do not retain the abnormal combination, tion, and psychology. The related video emotion database is the
which is not conducive to the algorithm jumping out of the local foundation of expression recognition research, so this paper uses
optimization dilemma. Therefore, in terms of search ability, the the SVAEE dataset (Liu et al., 2022) to provide data for emotion re-
GHS algorithm and LACHS algorithm are stronger. However, due search that needs training and testing. The SVAEE dataset consists
to the simple data structure of the Fashion-MNIST dataset, the of videos of four actors in seven emotions, and each video is about
optimal results of various intelligent algorithms cannot be sepa- 3 seconds long. This dataset standardizes 68 key points of the face
rated, so the more complex the CIFAR10 dataset was selected for of the test object in the database. A specific sample example of the
experimental comparison. SAVEE database is shown in Fig. 14 below:
As can be seen from Fig. 13, the optimal results of other al- In the experiment, a video frame was shot every 50 frames, and
gorithms except the LACHS algorithm are not ideal. Finally, their a total of 1957 pictures were obtained, each with a size of 48 × 48.
optimal results are still the best harmony in the original har- Then the generated image dataset is divided into a training set
mony library. These algorithms have not played an optimal role. and a testing set according to the proportion of 90% and 10%. In
Although the local autonomous competition HS algorithm fell the LACHS algorithm, the coefficients a, b, and C in formula (6)
into the dilemma of local fitting at the beginning, with the num- are set to 10, 5, and 10, respectively, and the rest configurations
ber of iterations, it used the autonomous decision-making search are unchanged. Then, the CNN obtained by the LACHS algorithm
strategy based on the optimal state to help itself jump out of is trained at 500 times. Because of the high computational cost,
the dilemma in time and find a better search result. The anal- only the VGG16 network model with the same training time of 500
ysis of the last optimized population shows that the LACHS al- times is used as the control group. The training results are shown
gorithm and GHS algorithm have better optimization effects. Af- in Table 12.
ter the same search times, the combination of the final optimized Table 12 shows that CNN based on LACHS can produce higher
population of the LACHS algorithm and GHS algorithm generally accuracy than VGG16, which verifies the effectiveness of LACHS
belongs to the elite combination to reach the convergence state. algorithm and shows the potential of LACHS in solving practical
In contrast, the HS algorithm and IHS algorithm have not reached applications.
the convergence state. Therefore, the search speed of the LACHS
algorithm and GHS algorithm is better than that the HS algorithm 4.7. Further discussion
and IHS algorithm. Although the GHS algorithm has good search The experiment in Section 4 proves the superiority of the LACHS
speed, its search ability is still not as good as that the LACHS al- algorithm in CNN superparameter optimization. In the experi-
gorithm, and GHS algorithm still fails to jump out of the dilemma ment, we use VGGNet as the basic network of hyperparametric
of local fitting until the end of the search. optimization. The LACHS algorithm can get better results than
Therefore, compared with the classical HS algorithm of the the same type of VGGNet16 and other most advanced CNN. How-
same type, the LACHS algorithm has more advantages in terms ever, the LACHS algorithm also has its limitations. Because the
of superparameter optimization. LACHS algorithm optimizes the given basic network superparam-
In short, considering the difficulty of processing the hyperpara- eters, if the basic network is not suitable for the optimization prob-
metric optimization of a given CNN, the contribution of this paper lem, the improvement effect brought by the algorithm is not ap-
is reasonable. parent. In this paper, there are two reasons for choosing VGGNet
as the basic network architecture. First of all, it is a typical CNN
4.6. Expression recognition case study in deep learning, and it is a representative example of superpa-
To evaluate the performance of LACHS-CNN in practical applica- rameter optimization through this algorithm. Secondly, under the
tion, a case study of expression recognition is conducted in this same effect, its structure is simpler and the calculation cost is
part. Expression recognition can significantly promote the inte- lower. However, choosing the appropriate CNN model to solve the
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1295

Figure 14: Seven different expressions of the same person in SVAEE dataset.

Table 12: Comparison with manually designed CNN.

Method Network model SVAEE

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


Manually designed CNN VGG16 95.92
The network model constructed by this algorithm CNN based on LACHS 97.96

corresponding tasks is also a challenging problem. But this needs Data Availability
further study, which is beyond the scope of this paper. Therefore,
The data used to support the findings of this study are available
the contribution of this paper is reasonable for the hyperparame-
from the corresponding author upon request.
ter optimization problem of a given CNN.

5. Conclusions Conflict of interest statement


In this paper, aiming at the two difficulties of CNN hyperpa- None declared.
rameter optimization, an improved HS algorithm for efficiently
optimizing CNN hyperparameter is proposed. In this paper, the
LACHS algorithm adopts a dynamic adjustment strategy for
References
parameters, a self-selection search strategy based on update
state, and a local competition update strategy to solve the com- Aszemi N. M., & Dominic P. D. D. (2019). Hyperparameter optimiza-
plex optimization problem in large-scale search space. A new tion in convolutional neural network using genetic algorithms.
evaluation function is designed, which can save the calcula- International Journal of Advanced Computer Science and Applications,
tion cost without affecting the search results. The compara- 10(6). https://doi.org/10.14569/IJACSA.2019.0100638.
tive experiments on two datasets, Fashion-MNIST and CIFAR10, Awad N., Mallik N., & Hutter F. (2020). Differential
verify the superiority of the improved HS algorithm in this evolution for neural architecture search. preprint
paper. (arXiv:2012.06400).https://doi.org/10.48550/arXiv.2012.06400
In the future, the HS algorithm will be used as an opti- Bergstra J., Bardenet R., Bengio Y., & Kégl B. (2011). Algorithms for
mization search method for CNN superparameter optimization. hyper-parameter optimization. In Proceedings of the 24th Interna-
We can continue to study the influence of hyperparameters on tional Conference on Neural Information Processing Systems(pp. 2546–
the fitting law of network model, so as to formulate more rea- 2554). Curran Associates Inc. doi/10.5555/2986459.2986743.
sonable optimization function values to save time and calcu- Bergstra J., & Bengio Y. (2012). Random search for hyper-parameter
lation cost. On the optimization mechanism of the algorithm, optimization. Journal of Machine Learning Research, 13(1), 281–305.
the possibility of combining grid search, random search, lo- https://doi.org/10.1016/j.chemolab.2011.12.002.
cal search, and HS is considered. It is expected to get an im- Castelli M., Silva S., Manzoni L., & Vanneschi L. (2014). Geometric se-
proved algorithm which is more suitable for hyperparameter lective harmony search. Information Sciences, 279, 468–482. https:
optimization. //doi.org/10.1016/j.ins.2014.04.001.
Chen Y., Liu F., & Pei K. (2021). Cross-modal matching CNN for
autonomous driving sensor data monitoring. In Proceedings of
Acknowledgments the IEEE/CVF International Conference on Computer Vision(pp. 3110–
The authors would like to thank P. N. Suganthan for the use- 3119). IEEE.
ful information about meta-heuristic algorithms and optimiza- Doon R., Rawat T. K., & Gautam S. (2018). Cifar-10 classification us-
tion problems on his homepages. The authors also thank Prof. ing deep convolutional neural network. In Proceedings of the 2018
Zhihui Zhan of South China University of Technology. This IEEE Punecon(pp. 1–5). IEEE. https://doi.org/10.1109/PUNECON.20
work is supported by the Fund of Innovative Training Program 18.8745428.
for College Students of Guangzhou University (approval num- Fernandes F. E., & Yen G. G. (2021). Pruning deep convolutional neu-
ber: S202111078042), Guangzhou City School Joint Fund Project ral networks architectures with evolution strategy. Information Sci-
(2023A03J01009), National Nature Science Foundation of China ences, 552, 29–47. .
(Grant Nos 52275097 and 61806058), Natural Science Foundation Feurer M., & Hutter F. (2019). Hyperparameter optimization. In Au-
of Guangdong Province (2018A030310063), and Guangzhou Sci- tomated machine learning: Methods, systems, challenges(pp. 3–33).
ence and Technology Plan (201804010299). Springer. https://doi.org/10.1007/978- 3- 030- 05318- 5.
1296 | Hyperparameters Optimization of CNN by Harmony Search Algorithm

Geem Z. W., Kim J. H., & Loganathan G. V. (2001). A new heuristic mixed-variable hyperparameters optimization in convolutional
optimization algorithm: Harmony search. Simulation, 76(2), 60–68. neural networks. IEEE Transactions on Neural Networks and Learn-
https://doi.org/0037- 5497(2001)l:260:ANHOAH2.0.TX;2- 3. ing Systems, 34, 2338–2352. https://doi.org/10.1109/TNNLS.2021.3
Goodfellow I., Warde-Farley D., Mirza M. Courville A., & Bengio Y. 106399.
(2013). Maxout networks. In Proceedings of the International Confer- Li J.-Y., Zhan Z.-H., Tan K. C., & Zhang J. (2023b). Dual differential
ence on Machine Learning(pp. 1319–1327). PMLR. grouping: A more general decomposition method for large-scale
Guo Y., Li J.-Y., & Zhan Z.-H. (2020). Efficient hyperparameter opti- optimization. IEEE Transactions on Cybernetics, 53, 3624–3638. https:
mization for convolution neural networks in deep learning: A dis- //doi.org/10.1109/TCYB.2022.3158391.
tributed particle swarm optimization approach. Cybernetics and Li J.-Y., Zhan Z.-H., & Zhang J. (2022). Evolutionary com-
Systems, 52(2), 1–22. https://doi.org/10.1080/01969722.2020.1827 putation for expensive optimization: A survey. Machine
797. Intelligence Research, 19(1), 3–23. https://doi.org/10.1007/s116
He K., Zhang X., Ren S., & Sun J. (2016). Deep residual learning for im- 33- 022- 1317- 4.
age recognition. In Proceedings of the 2016 IEEE Conference on Com- Lin M., Chen Q., & Yan S. (2013). Network in network. preprint
puter Vision and Pattern Recognition (CVPR)(pp. 770–778). IEEE. (arXiv:1312.4400).https://doi.org/10.48550/arXiv.1312.4400
Huang G., Liu Z., Van Der Maaten L., & Weinberger K. Q. Liu R., Sisman B., Schuller B., Gao G., & Li H. (2022). Ac-

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


(2016). Densely connected convolutional networks. In curate emotion strength assessment for seen and un-
Proceedings of the 2017 IEEE Conference on Computer Vi- seen speech based on data-driven deep learning. preprint
sion and Pattern Recognition (CVPR)(pp. 2261–2269). IEEE. (arXiv:2206.07229).https://doi.org/10.48550/arXiv.2206.07229
https://doi.org/10.1109/CVPR.2017.243 Lu Z., Whalen I., Dhebar Y., Deb K., Goodman E. D., Banzhaf W., & Bod-
Huang J., Xue B., Sun Y., Zhang M., & Yen G. G. (2022). Particle swarm deti V. N. (2020). Multiobjective evolutionary design of deep con-
optimization for compact neural architecture search for image volutional neural networks for image classification. IEEE Transac-
classification. IEEE Transactions on Evolutionary Computation. https: tions on Evolutionary Computation, 25(2), 277–291. https://doi.org/
//doi.org/10.1109/TEVC.2022.3217290. 10.1109/TEVC.2020.3024708.
Jian J.-R., Chen Z.-G., Zhan Z.-H., & Zhang J. (2021). Region encod- Mahdavi M., Fesanghary M., & Damangir E. (2007). An improved har-
ing helps evolutionary computation evolve faster: A new solu- mony search algorithm for solving optimization problems. Ap-
tion encoding scheme in particle swarm for large-scale optimiza- plied Mathematics and Computation, 188(2), 1567–1579. https://doi.
tion. IEEE Transactions on Evolutionary Computation, 25(4), 779–793. org/10.1016/j.amc.2006.11.033.
https://doi.org/10.1109/TEVC.2021.3065659. Omran M. G. H., & Mahdavi M. (2008). Global-best harmony
Jian J.-R., Zhan Z.-H., & Zhang J. (2020). Large-scale evolutionary op- search. Applied Mathematics & Computation, 198(2), 643–656.
timization: A survey and experimental comparative study. Inter- https://doi.org/10.1016/j.amc.2007.09.004
national Journal of Machine Learning and Cybernetics, 11(3), 729–745. Pan Q.-K., Suganthan P. N., Tasgetiren M. F., & Liang J. J. (2010). A
https://doi.org/10.1007/s13042- 019- 01030- 4. self-adaptive global best harmony search algorithm for continu-
Kandasamy K., Neiswanger W., Schneider J., Barnabás P., & Xing E. ous optimization problems. Applied Mathematics and Computation,
P. (2018). Neural architecture search with Bayesian optimisation 216(3), 830–848. https://doi.org/10.1016/j.amc.2010.01.088.
and optimal transport. NIPS’18: Proceedings of the 32nd International Qi Y., Yang Z., Sun W., Lou M., Lian J., Zhao W., Deng X., & Ma Y. (2022).
Conference on Neural Information Processing Systems, 2020–2029. http A comprehensive overview of image enhancement techniques.
s://dl.acm.org/doi/abs/10.5555/3326943.3327130 Archives of Computational Methods in Engineering, 29, 583–607. https:
Karpathy A. (2016). CS231n convolutional neural networks for visual //doi.org/10.1007/s11831- 021- 09587- 6.
recognition. Neural Networks, 1(1). https://cs231n.github.io/convo Raymond C., & Beng O. K. (2007). A comparison between genetic al-
lutional-networks. gorithms and evolutionary programming based on cutting stock
Khan A., Sohail A., Zahoora U., & Qureshi A. S. (2020). A survey of problem. Engineering Letters, 14(1), 72–77. https://www.engineerin
the recent architectures of deep convolutional neural networks. gletters.com/issues_v14/issue_1/EL_14_1_14.
Artificial Intelligence Review, 53, 5455–5516. https://doi.org/10.100 Real E., Moore S., Selle A., Saxena S., Suematsu Y. L., Tan J., Le Q., & Ku-
7/s10462- 020- 09825- 6. rakin A. (2017). Large-scale evolution of image classifiers. In Pro-
Krizhevsky A., Sutskever I., & Hinton G. E. (2012). ImageNet classifica- ceedings of the International Conference on Machine Learning(pp. 2902–
tion with deep convolutional neural networks. In Advances in neu- 2911). PMLR.
ral information processing systems. 25, (pp.1097–1105). Curran Asso- Simonyan K., & Zisserman A. (2014). Very deep convolutional
ciates. https://doi.org/10.1145/3065386. networks for large-scale image recognition. CoRR, preprint
Larochelle H., Erhan D., & Courville A. C. (2007). An empirical eval- (arXiv:1409.1556).
uation of deep architectures on problems with many factors of Springenberg J. T., Dosovitskiy A., Brox T., & Riedmiller M. (2014).
variation. In Proceedings of the 24th International Conference on Ma- Striving for simplicity: The all convolutional net. preprint
chine Learning(pp. 473–480). ACM. (arXiv:1412.6806).https://doi.org/10.48550/arXiv.1412.6806
Lecun Y., Bengio Y., & Hinton G. (2015). Deep learning. Nature, Suganuma M., Kobayashi M., Shirakawa S., & Nagao T. (2020). Evo-
521(7553), 436–444. https://doi.org/10.1038/nature14539. lution of deep convolutional neural networks using cartesian
Lee C. Y., Xie S., Gallagher P., Zhang Z., & Tu Z. (2015). Deeply- genetic programming. Evolutionary Computation, 28(1), 141–163.
supervised nets. Proceedings of the 18th International Conference on https://doi.org/10.1162/EVCO_A_00253.
Artificial Intelligence and Statistics(pp. 562–570). PMLR. Sun Y., Xue B., Zhang M., & Yen G. G. (2019). Evolving deep convolu-
Li J.-Y., Zhan Z.-H., Wang C., Jin H., & Zhang J. (2020). Boosting data- tional neural networks for image classification. IEEE Transactions
driven evolutionary algorithm with localized data generation. on Evolutionary Computation, 24(2), 394–407. https://doi.org/10.110
IEEE Transactions on Evolutionary Computation, 24(5), 923–937. https: 9/TEVC.2019.2916183.
//doi.org/10.1109/TEVC.2020.2979740. Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Er-
Li J.-Y., Zhan Z.-H., Xu J., Kwong S., & Zhang J. (2023a). Surrogate- han D., Vanhoucke V., & Rabinovich A. (2014). Going deeper
assisted hybrid-model estimation of distribution algorithm for with convolutions. In Proceedings of the 2015 IEEE Confer-
Journal of Computational Design and Engineering, 2023, 10(4), 1280–1297 | 1297

ence on Computer Vision and Pattern Recognition (CVPR)(pp. 1–9). for large-scale optimization. IEEE Transactions on Cybernetics, 51(3),
IEEE.https://doi.org/10.1109/CVPR.2015.7298594 1175–1188. https://doi.org/10.1109/TCYB.2020.2977956.
Tan M., & Le Q. (2019). EfficientNet: Rethinking model scaling for con- Wu S.-H., Zhan Z.-H., & Zhang J. (2021). SAFE: Scale-adaptive fit-
volutional neural networks. In Proceedings of the International Con- ness evaluation method for expensive optimization problems.
ference on Machine Learning(pp. 6105–6114). PMLR. IEEE Transactions on Evolutionary Computation, 25(3), 478–491. https:
Turky A. M., Abdullah S., & Sabar N. R. (2014). A hybrid harmony //doi.org/10.1109/TEVC.2021.3051608.
search algorithm for solving dynamic optimisation problems. Pro- Xiao H., Rasul K., & Vollgraf R. (2017). Fashion-MNIST: A novel image
cedia Computer Science, 29, 1926–1936. https://doi.org/10.1016/j.pr dataset for benchmarking machine learning algorithms. preprint
ocs.2014.05.177. (arXiv:1708.07747). https://doi.org/10.48550/arXiv.1708.07747
Wang S., Chen L., Xu L., Fan W., Sun J., & Naoi S. (2016). Deep knowl- Yu Z., Chan W., & Jaitly N. (2017). Very deep convolutional networks
edge training and heterogeneous CNN for handwritten Chinese for end-to-end speech recognition. In Proceedings of the 2017 IEEE
text recognition. In Proceedings of the 2016 15th International Con- International Conference on Acoustics, Speech and Signal Processing
ference on Frontiers in Handwriting Recognition (ICFHR)(pp. 84–89). (ICASSP)(pp. 4845–4849). IEEE. https://doi.org/10.1109/ICASSP.201
IEEE. 7.7953077.
Wang Y. Q., Li J. Y., Chen C. H., Zhang J., & Zhan Z. H. (2022a). Scale Zhan Z. H., Shi L., Tan K. C., & Zhang J. (2022a). A survey on evolution-

Downloaded from https://academic.oup.com/jcde/article/10/4/1280/7199167 by guest on 08 July 2023


adaptive fitness evaluation-based particle swarm optimisation ary computation for complex continuous optimization. Artificial
for hyperparameter and architecture optimisation in neural net- Intelligence Review, 55, 59–110. https://doi.org/10.1007/s10462-021
works and deep learning. CAAI Transactions on Intelligence Technol- - 10042- y.
ogy, 1–14. https://doi.org/10.1049/cit2.12106. Zhan Z.-H., Li J.-Y., & Zhang J. (2022b). Evolutionary deep learning:
Wang Z.-J., Jian J.-R., Zhan Z.-H., Li Y., Kwong S., & Zhang J. (2022b). A survey. Neurocomputing, 483, 42–58. https://doi.org/10.1016/j.ne
Gene targeting differential evolution: A simple and efficient ucom.2022.01.099.
method for large scale optimization. IEEE Transactions on Evolu- Zhang Q., Zhang M., Chen T., Sun Z., Ma Y., & Yu B. (2018). Re-
tionary Computation. https://doi.org/10.1109/TEVC.2022.3185665. cent advances in convolutional neural network acceleration.
Wang B., Xue B., & Zhang M. (2021). Surrogate-assisted particle Neurocomputing, 325, 37–51. https://doi.org/10.1016/j.neucom.201
swarm optimization for evolving variable-length transferable 8.09.038.
blocks for image classification. IEEE Transactions on Neural Net- Zoph B., & Le Q. V. (2016). Neural architecture search
works and Learning Systems, 33(8), 3727–3740. https://doi.org/10.1 with reinforcement learning. preprint (arXiv:1611.01578).
109/TNNLS.2021.3054400. https://doi.org/10.48550/arXiv.1611.01578
Wang Z.-J., Zhan Z.-H., Kwong S., Jin H., & Zhang J. (2020). Adap-
tive granularity learning distributed particle swarm optimization

Received: March 21, 2023. Revised: May 25, 2023. Accepted: May 30, 2023
© The Author(s) 2023. Published by Oxford University Press on behalf of the Society for Computational Design and Engineering. This is an Open Access article distributed
under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and
reproduction in any medium, provided the original work is properly cited.

You might also like