Professional Documents
Culture Documents
Development of Deep Residual Neural Networks For Gear Pitting Fault Diagnosis Using Bayesian Optimization
Development of Deep Residual Neural Networks For Gear Pitting Fault Diagnosis Using Bayesian Optimization
Abstract— In recent years, the application of deep neural hyperparameter optimization are crucial for the feature rep-
networks containing directed acyclic graph (DAG) architectures resentation of fault data and the final diagnosis results [2].
in mechanical fault diagnosis has achieved remarkable results. However, the design of neural architectures relies heavily on
In order to improve the fault diagnosis ability of the networks,
researchers have been working on developing new network the researchers’ prior knowledge and experience. Moreover,
architectures and optimizing the training process. However, this due to the limitation of inherent human knowledge, it is
approach requires sufficient time and empirical knowledge to difficult for people to jump out of the original way of thinking
try a potential optimal framework. Furthermore, it is time- and design the optimal model. Therefore, an intuitive idea is
consuming and laborious to retune the network architecture to reduce human intervention as much as possible and let the
and hyperparameter values when faced with different operating
conditions or diagnostic tasks. To avoid these drawbacks, this algorithm automatically design the neural network architecture
article proposes an automated network architecture search (NAS) and do hyperparameters optimization. In this way, a revolu-
method and performs hyperparameter optimization. The adja- tionary neural architecture search (NAS) [3], [4], [5], [6], [7]
cency matrix is used to define the architecture search space, the algorithm emerges as the times require, and its related research
Bayesian optimization is used as the architecture search strategy, work is complex and rich. The current network architecture
and a network test error is used for architecture evaluation. Seven
types of convolutional layers and pooling layers are used as basic search (NAS) algorithm can be divided into: search space,
components to build fault diagnosis models. The gear pitting fault search strategy, and evaluation strategy according to the key
experiment, including seven gear pitting types, was established components of NAS [8].
and used to validate the diagnostic model. The experimental The search space defines the set of network architecture to
results show that the diagnostic results of the network model be searched, which is crucial to the subsequent neural NAS.
automatically constructed by the proposed method are better
than those of the general network model. It can be concluded that The size of the search space can be reduced by incorporating
the proposed method can indeed replace the manual construction prior knowledge of the typical properties of the applicable
of an effective and practical gear pitting fault diagnosis model. task. However, this introduces inherent human thinking that
Index Terms— Bayesian optimization, directed acyclic graph may be detrimental to the discovery of better new architec-
(DAG) network, gear pitting faults, hyperparameter optimization, tures [9]. The traditional neural network architecture (search
network architecture search (NAS). space), such as the traditional convolutional neural network
(CNN) [10], has a chain structure, that is, it can be expressed
I. I NTRODUCTION as a sequence of n layers, and the i th layer network receives
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
2521715 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022
neural networks. However, these practices also limit the search avoid training from scratch. ENAS [26] and DARTS directly
diversity of NAS algorithms. let each subnetwork share the same set of weight parameters.
The search strategy details how to explore the search space, This article aims to address network development and
which is often exponential or even unbounded. Therefore, hyperparameter optimization problems when diagnosing gear
there is a tradeoff between exploration and mining. On the one pitting faults using deep DAG [27], [28], [29] neural networks.
hand, we want to quickly find an architecture with good perfor- The Bayesian optimization is used to determine the network
mance, and on the other hand, we should avoid getting stuck in architecture and hyperparameters of the deep DAG neural
the local optimal architecture region. Currently known search network. Seven types of convolutional layers and pooling
methods include random search, Bayesian optimization [16], layers are used as basic components to build fault diagnosis
genetic algorithm (GA) [17], reinforcement learning [18], and models. The DAG network architecture is represented using
gradient-based algorithm [19]. Among them, reinforcement an adjacency matrix, and the Bayesian optimization is used
learning, genetic learning, and gradient-based optimization to search for the best DAG architecture and hyperparameter
are the current mainstream algorithms. The NAS algorithm values. This article is arranged as follows. Section I presents
based on reinforcement learning regards the neural network related research on DNN architecture search and hyperpa-
architecture design as a reinforcement learning problem, and rameter optimization methods. Section II describes the cur-
the learning purpose is to obtain an optimal strategy for rent problem and proposes a deep DAG neural NAS and
generating the network architecture [20]. Since the length of hyperparameter optimization method based on a Bayesian
the structural parameters of the neural network is not fixed, optimization strategy. Section III describes the gear pitting test
a learning method that takes a variable-length data chain as rig, hyperparameter information, and optimization procedures.
input is required. A recurrent neural network is used as a Section IV compares and analyzes the diagnostic results of
controller to generate chains that describe the architecture the deep DAG neural network constructed by the proposed
of the subnetworks, thereby determining several subnetwork method. Finally, Section V gives a conclusion.
architectures, and these networks are trained on the training
set and their accuracy is calculated on the validation set. The II. C URRENT P ROBLEM AND P ROPOSED M ETHOD
idea of using GA to solve NAS is to encode various network This section summarizes the difficulties in developing a
architectures into binary strings and run the GA to obtain the gear pitting fault diagnosis model based on residual neural
network architecture with the largest fitness function value, network and proposes a method for developing a deep DAG
which is the optimal solution. Regarding how to encode the fault diagnosis model and hyperparameter optimization based
architecture of a neural network into a fixed-length binary on the Bayesian optimization for current problems.
string, Xie and Yuille [21] divided the network into multiple
levels, encoding the topology of each level. The gradient-based A. Current Problem
algorithm is to make the discrete problem of architecture
The application of DNNs containing DAG architectures in
search continuous. In this way, the gradient descent method
mechanical fault diagnosis has achieved remarkable results.
is used to solve the problem, which can efficiently search the
However, the following problems need to be solved when
neural network architecture and obtain the weight parameters
automatically building a fault diagnosis model based on the
of the network at the same time. DARTS [22] expresses the
DAG neural network.
network architecture and network unit as a directed acyclic
1) Global Search Space: At present, the strategy adopted by
graph (DAG), relaxes the architecture search problem, and
most NAS is to search the entire network space, which
transforms it into a continuous variable optimization problem.
means that NAS needs to search for an optimal network
In this way, the objective function is derivable and can be
architecture in a very large search space.
solved by gradient descent.
2) Discrete Search Strategy: The search space of early NAS
Evaluation strategy refers to the process of evaluating per-
is discrete, which means that the gradient cannot be
formance. The simplest approach is to sequentially train and
calculated, and the gradient strategy cannot be used to
validate numerous architectures, but doing so is computation-
adjust the network model architecture.
ally expensive and limits the number of architectures that can
3) Search From Scratch: Each model is trained from
be explored [23]. Therefore, many recent studies have focused
scratch, which will not take full advantage of the archi-
on developing methods to reduce the cost of these performance
tecture of the existing network model and the parameters
evaluations, such as reducing training time (the number of
that have been trained.
iterations), training on a subset of training samples, training on
low-resolution images, or reducing the number of convolution
kernels for certain layers during training. However, these B. Proposed Method
practices may lead to biases in performance estimates while In this article, a Bayesian optimization strategy is proposed
reducing computational costs. Using the characteristics of the to develop a DAG network for gear pitting fault diagnosis.
unit block to predict the performance of the entire network The proposed method focuses on the DAG NAS and training
and weight sharing may solve this contradiction [24], [25], hyperparameter optimization using the Bayesian optimization.
that is, using the weight of the trained subnetwork as the The overall framework of the proposed method is shown in
initial weight of the subnetwork to be evaluated can effectively Fig. 1. Also, the overall process is given in the following four
improve the training speed, accelerate the convergence, and steps.
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
LI et al.: DEVELOPMENT OF DEEP RESIDUAL NEURAL NETWORKS FOR GEAR PITTING FAULT DIAGNOSIS 2521715
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
2521715 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022
I
m
x l+1 = W m ∗ x lc + bm (1)
i=1
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
LI et al.: DEVELOPMENT OF DEEP RESIDUAL NEURAL NETWORKS FOR GEAR PITTING FAULT DIAGNOSIS 2521715
Fig. 6. Example of 1-D Gaussian process with three observations. Then, the next observation point x n+1 and the existing obser-
vation points x 1:n still follow the joint Gaussian probability
distribution, so we can get:
D. Architecture Search Strategy ⎧
⎪
⎨ f (x 1:n ) ∼ N m x 1:n K k∗
In this article, a Bayesian optimization algorithm is used , T
f (x n+1 ) x n+1 k∗ k(x n+1 , x n+1 ) (9)
as the search strategy of NAS. The parameters A1– A5 are ⎪
⎩
selected in the range by means of Bayesian optimization. The k∗ = k(x n+1 , x 1 ), k(x n+1 , x 2 ), · · · k(x n+1 , x n ) .
Bayesian optimization algorithm [38], [39] realizes the opti-
mization of hyperparameters by continuously adding obser- Therefore, the probability distribution of the observation point
vation points and performing the Gaussian process to fit the x n+1 in the entire coordinate system can be easily obtained as
objective function. The mathematical expression of the optimal ⎧
⎪
⎨ P( f (x n+1 )|D1:n , x n+1 ) ∼ N μ(x n+1 ), σ (x n+1 )
2
hyperparameters x best of the objective function is given as
follows: μ(x n+1 ) = k∗T K−1 f (x 1:n ) (10)
⎪
⎩ 2 T −1
σ (x n+1 ) = k(x n+1 , x n+1 ) − k∗ K k∗
x best = argminx∈χ f (x) (5)
where D1:n represent the positions of observation points
where f (x) is the objective function and χ is the hyperpara- {x 1:n , f (x 1:n )}. The position of the next observation point can
meter space. be determined according to the obtained probability distribu-
Fig. 6 shows a 1-D Gaussian process with three observation tion of x n+1 at each location and combined with the acquisition
points. The multiple observation points conform to the joint function
Gaussian distribution. The mathematical expression is given
as follows: x next = argmaxx∈χ Acq(x) (11)
⎧
⎪
⎨ f (x) ∼ G P m(x), k x, x where the next observation point x next corresponds to get the
m(x) = E[ f (x)] (6) maximum value in the function acquisition function.
⎪
⎩
The commonly used acquisition function [41] lower confi-
k x, x = E ( f (x) − m(x)) f x − m x
dence bound (LCB), upper confidence bound (UCB), proba-
where the mean m(x) and variance k(x, x ) of the Gaussian bility of improvement (PI), and EI, are given as follows:
distribution are also given above. ⎧
According to the calculation process of the joint Gaussian ⎪
⎪ kσ Q (x) − μ Q (x)
⎪
⎨ kσ (x) + μ (x)
distribution in (6), it can be inferred that the joint Gaussian Q
+ Q
Acq(x) = (12)
distribution containing n observation points can be expressed ⎪
⎪ f x − μ (x) − ξ /σ (x)
⎪
⎩ +
Q
Q
as follows: f x − μ Q (x) − ξ (Z ) + σ Q (x)φ(Z )
⎧
⎪ f (x 1:n ) ∼ G P[m(x 1:n ), K]
⎪
⎪ where x + is the lowest value among existing observation
⎪
⎪
⎨m(x 1:n⎡) = E[ f (x 1:n )]
⎪ ⎤ points.
k(x 1 , x 1 ) · · · k(x 1 , x n ) (7) In order to deal with the uncertainty of the results during
⎪
⎪ ⎢ .. . .. ⎥ DNN training, artificial noise ∼ N(0, σnoise 2
) [42] is added
⎪ ⎢ . ⎥
⎪K = ⎣
⎪
⎪ . . . ⎦
⎩ to the prior results when using the Bayesian optimization.
k(x n , x 1 ) · · · k(x n , x n ) Then, the new objective function g(x) and the joint probability
distribution of x 1:n are given as follows:
where the value of the kernel function k(x i , x j ) changes
between [0, 1] with the distance between points x i and x j g(x) = f (x) +
from far to near. The commonly used kernel functions [40], (13)
g(x 1:n ) ∼ G P m(x 1:n ), K + σnoise
2
I .
squared exponential kernel, improved squared exponential
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
2521715 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
LI et al.: DEVELOPMENT OF DEEP RESIDUAL NEURAL NETWORKS FOR GEAR PITTING FAULT DIAGNOSIS 2521715
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
2521715 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022
Fig. 9. Partial network architecture units display of Bayesian optimization search. (a) s1. (b) s2. (c) s3. (d) s4. (e) s5. (f) s7. (g) s9. (h) s19. (i) s24.
which is improved compared to Table IV. It can be concluded Fig. 12(c) shows the impact of the momentum term on
that training hyperparameter optimization training can improve model training. It can be seen from the figure that the
the accuracy of the trained model to a certain extent. Moreover, fluctuation is small in the range of 0.1–0.5, and the stability
based on the optimal hyperparameter combination obtained in of model training is poor after it is greater than 0.6. This is
Table VI, the influence of one single hyperparameter on the also because the Bayesian optimization strategy focuses more
diagnostic results is also discussed one by one in this section. on the momentum term mining when the model error is small.
First, the training initial learning rate is discussed. Fig. 12(a) Fewer attempts in the 0.6–1 range are prone to fluctuations.
shows the process of using the Bayesian optimization to find Fig. 12(d) shows the effect of the regularization coefficient L2
the best initial learning rate value. The figure shows trying on model training. The curve in the figure is arched, and the
various learning rates to find the optimal value based on the search results show that it is easier to train an excellent DAG
optimal network architecture. It can be seen that the optimal diagnostic model when L2 is 0.01.
model architecture is very robust to the learning rate value. All subgraph curves in Fig. 12 are obtained after 30 times
The results show that it is easier to get a small error model verification and fitting using a Gaussian process distribution.
when the learning rate value is around 0.06. Fig. 12(b) shows Considering the uncertainty of deep residual neural network
the effect of batch size on training the model. It is easier to get training, the effect of noise error is added in curve fitting.
good training models with batch sizes of around 40. The trend It can be seen from Fig. 12 that the dispersion of verification
in the figure is probably that smaller batch size values result points is large, and the test errors of similar hyperparameter
in smaller error values. However, the sample batch size will combination values are not similar. It is precise because the
directly affect the training time of the model. Setting a smaller number of validation points is limited and the results of
sample batch size will reduce the training speed of the model. repeatedly training the deep residual network are not fixed,
Therefore, the batch size should be selected considering the the fit curve does not well reflect the hyperparameter–test error
tradeoff between model accuracy and training speed. relationship.
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
LI et al.: DEVELOPMENT OF DEEP RESIDUAL NEURAL NETWORKS FOR GEAR PITTING FAULT DIAGNOSIS 2521715
TABLE IV
R ESULTS OF S TRUCTURAL PARAMETER O PTIMIZATION U SING
THE B AYESIAN O PTIMIZATION M ETHOD
TABLE V
Fig. 10. Deep DAG fault diagnosis model developed by the Bayesian
C OMPARISON OF THE R ESULTS OF T HREE D IAGNOSTIC M ETHODS optimization.
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
2521715 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022
TABLE VI
R ESULTS OF H YPERPARAMETER O PTIMIZATION U SING THE
BAYESIAN O PTIMIZATION M ETHOD
TABLE VII
C OMPARISON OF THE B EST H YPERPARAMETERS O PTIMIZATION
R ESULTS OF D IFFERENT M ETHODS
Fig. 11. Gear pitting fault test confusion matrix. (a) DNN. (b) CNN.
(c) Proposed method.
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
LI et al.: DEVELOPMENT OF DEEP RESIDUAL NEURAL NETWORKS FOR GEAR PITTING FAULT DIAGNOSIS 2521715
Fig. 12. Effect of a single hyperparameter on test error. (a) Learning rate. (b) Mini-Batchsize. (c) Momentum. (d) L 2 Regularization.
TABLE VIII
C OMPARISON OF A DVANTAGES AND S HORTCOMINGS OF S IX T YPES OF O PTIMIZATION M ETHODS
0.02 and 0.04, Bs is around 120, Mo is around 0.7, and the advantage of this method is that the search for hyperparameter
L 2 coefficient is between 0.004 and 0.008. combinations is more detailed, and the disadvantage is that the
4) Genetic Algorithm: It selects individuals with excellent gene type completely depends on the randomly generated first-
performance as parents for genetic cross inheritance. The generation genes. Gene mutation operations can appropriately
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
2521715 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022
Fig. 13. Effect of joint hyperparameters on test error. (a) Joint effect of initial learning rate and momentum term on test error. (b) Joint effect of initial
minibatch size and L 2 regularization term on test error.
Fig. 14. Hyperparameter optimization process and results using a grid search method.
improve the situation with fewer gene types. In this article, the in Fig. 17, the final remaining genes are L r = [0.0221, 0.0386,
initial number of genes for each type is set to 10 and processes 0.0513, 0.0828], Bs = [26, 58, 64], Mo = [0.2074, 0.2636,
five generations of gene mutation and elimination. As shown 0.4296], and L 2 = [5.864E−3, 8.743E−3].
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
LI et al.: DEVELOPMENT OF DEEP RESIDUAL NEURAL NETWORKS FOR GEAR PITTING FAULT DIAGNOSIS 2521715
Fig. 15. Hyperparameter optimization process and results using a random search method.
5) Gray Wolf Optimization: The optimization idea of the Compared with other algorithms, the biggest advantage is that
gray wolf algorithm is similar to that of the particle swarm it considers the error when training the DNN model multiple
algorithm. They all through excellent individuals to guide times with the same hyperparameters.
the group movement to find the best hyperparameters. The According to the description of the above points, Table VIII
difference is that the gray wolf algorithm increases the number provides a brief comparison of the advantages and shortcom-
of command individuals from 1 to 3. This makes the iter- ings of the six optimization methods.
ative optimization process more stable but requires a larger
search population. The wolf pack size is set to 10, after five V. C ONCLUSION
wolf pack location updates. Comparing the search process in This article addressed the problems of model development
Figs. 16 and 18, it can be found that the search range of GWO and hyperparameter selection when training gear pitting fault
is larger, and the convergence speed is not as good as that of diagnosis model based on deep DAG neural networks. The
the PSO algorithm. Bayesian optimization method is used to search the best deep
6) Bayesian Optimization: It builds an objective function DAG neural network architecture and optimize the training
from prior data to find the best set of hyperparameters. hyperparameters such as learning rate (L r ), batch size (Bs ),
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
2521715 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022
momentum (Mo ), and L2 coefficients (L 2 ). In addition, the [10] Y. Sun, B. Xue, M. Zhang, and G. G. Yen, “Completely automated CNN
grid search, random search, PSO, GA, and GWO are used to architecture design based on blocks,” IEEE Trans. Neural Netw. Learn.
Syst., vol. 31, no. 4, pp. 1242–1254, Apr. 2019.
compare and analyze the influence of training hyperparameters [11] F. Chollet, “Xception: Deep learning with depthwise separable convo-
on model testing errors. The advantage of the proposed method lutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
is that it can avoid the limitations of human thinking and Jul. 2017, pp. 1251–1258.
[12] T. Elsken, J. Hendrik Metzen, and F. Hutter, “Efficient multi-
automatically construct a network structure with stronger objective neural architecture search via Lamarckian evolution,” 2018,
processing power. The disadvantage is that the network struc- arXiv:1804.09081.
ture search time is affected by the search strategy, and the [13] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution
for image classifier architecture search,” in Proc. AAAI Conf. Artif.
results are greatly affected by the definition of the search Intell., 2019, vol. 33, no. 1, pp. 4780–4789.
space. The detailed conclusions obtained are given as follows. [14] H. Cai, L. Zhu, and S. Han, “ProxylessNAS: Direct neural architecture
1) The DAG network architecture can be digitized through search on target task and hardware,” 2018, arXiv:1812.00332.
[15] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable
the binary adjacency matrix transformed by the decimal architectures for scalable image recognition,” in Proc. IEEE/CVF Conf.
parameters, and then, Bayesian optimized NAS can be Comput. Vis. Pattern Recognit., Jun. 2018, pp. 8697–8710.
realized. [16] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas,
“Taking the human out of the loop: A review of Bayesian optimization,”
2) The diagnostic accuracy of the best DAG neural network Proc. IEEE, vol. 104, no. 1, pp. 148–175, Jan. 2015.
model searched by the Bayesian optimization reaches [17] K. O. Stanley, D. B. D’Ambrosio, and J. Gauci, “A hypercube-based
0.9586, which is better than traditional DNN and CNN. encoding for evolving large-scale neural networks,” Artif. Life, vol. 15,
no. 2, pp. 185–212, Apr. 2009.
Also, after Bayesian optimization training hyperparame-
[18] B. Zoph and Q. V. Le, “Neural architecture search with reinforcement
ters, a higher network diagnosis accuracy of 0.9857 is learning,” 2016, arXiv:1611.01578.
obtained. [19] S. Xie, H. Zheng, C. Liu, and L. Lin, “SNAS: Stochastic neural
3) The reason why the Bayesian optimization algorithm architecture search,” 2018, arXiv:1812.09926.
[20] X. Dong, J. Shen, W. Wang, L. Shao, H. Ling, and F. Porikli, “Dynam-
is better than grid search and random search is that ical hyperparameter optimization via deep reinforcement learning in
it includes result feedback and tries point inference. tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 5,
The reason why Bayesian optimization is superior to pp. 1515–1529, May 2021.
[21] L. Xie and A. Yuille, “Genetic CNN,” in Proc. IEEE Int. Conf. Comput.
algorithms such as PSO, GA, and GWO is that it Vis. (ICCV), Oct. 2017, pp. 1379–1388.
considers the error situation of multiple results of DNN [22] H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture
training. Of course, random search may perform better search,” 2018, arXiv:1806.09055.
than the proposed method when the number of attempts [23] A. Zela, J. Siems, and F. Hutter, “NAS-bench-1shot1: Benchmarking and
dissecting one-shot neural architecture search,” 2020, arXiv:2001.10422.
is small. [24] X. Chu, B. Zhang, and R. Xu, “FairNAS: Rethinking evaluation fairness
4) In this article, experimental data under one working of weight sharing neural architecture search,” in Proc. IEEE/CVF Int.
condition are used to validate the proposed method that Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 12239–12248.
[25] H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean, “Efficient neural
has certain limitations. Therefore, future work should architecture search via parameters sharing,” in Proc. 35th Int. Conf.
pay attention to different working condition validation Mach. Learn., 2018, pp. 4095–4104.
and adaptability improvement. Moreover, the network [26] X. Li et al., “Improving one-shot NAS by suppressing the posterior
fading,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
search space and search strategy should also be further Jun. 2020, pp. 13836–13845.
developed. [27] V. Thost and J. Chen, “Directed acyclic graph neural networks,” 2021,
arXiv:2101.07965.
[28] J. Li, X. Li, and D. He, “A directed acyclic graph network combined
R EFERENCES with CNN and LSTM for remaining useful life prediction,” IEEE Access,
vol. 7, pp. 75464–75475, 2019.
[1] Y. Lei, B. Yang, X. Jiang, F. Jia, N. Li, and A. K. Nandi, “Applications [29] L. D’Arcy, P. Corcoran, and A. Preece, “Deep Q-learning for directed
of machine learning to machine fault diagnosis: A review and roadmap,” acyclic graph generation,” 2019, arXiv:1906.02280.
Mech. Syst. Signal Process., vol. 138, Apr. 2020, Art. no. 106587. [30] Y. Ci, C. Lin, M. Sun, B. Chen, H. Zhang, and W. Ouyang, “Evolving
[2] T. Yu and H. Zhu, “Hyper-parameter optimization: A review of algo- search space for neural architecture search,” in Proc. IEEE/CVF Int.
rithms and applications,” 2020, arXiv:2003.05689. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 6659–6669.
[3] M. Wistuba, A. Rawat, and T. Pedapati, “A survey on neural architecture [31] J. Fang, Y. Sun, Q. Zhang, Y. Li, W. Liu, and X. Wang, “Densely
search,” 2019, arXiv:1905.01392. connected search space for more flexible neural architecture search,”
[4] C. Liu et al., “Progressive neural architecture search,” in Proc. Eur. Conf. in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
Comput. Vis. (ECCV), 2018, pp. 19–34. Jun. 2020, pp. 10628–10637.
[5] C. Ying, A. Klein, E. Christiansen, E. Real, K. Murphy, and F. Hutter, [32] W. Wen, H. Liu, Y. Chen, H. Li, G. Bender, and P.-J. Kindermans,
“NAS-bench-101: Towards reproducible neural architecture search,” in “Neural predictor for neural architecture search,” in Proc. Eur. Conf.
Proc. 36th Int. Conf. Mach. Learn., 2019, pp. 7105–7114. Comput. Vis. New York, NY, USA: Springer, 2020, pp. 660–676.
[6] L. Li and A. Talwalkar, “Random search and reproducibility for neural [33] C. White, W. Neiswanger, S. Nolen, and Y. Savani, “A study on
architecture search,” in Proc. 35th Uncertainty Artif. Intell. Conf., 2020, encodings for neural architecture search,” in Proc. Adv. Neural Inf.
pp. 367–377. Process. Syst., vol. 33, 2020, pp. 20309–20319.
[7] J. Mellor, J. Turner, A. Storkey, and E. J. Crowley, “Neural architecture [34] V. T. Rhyne, “Serial binary-to-decimal and decimal-to-binary conver-
search without training,” in Proc. 38th Int. Conf. Mach. Learn., 2021, sion,” IEEE Trans. Comput., vol. C-19, no. 9, pp. 808–812, Sep. 1970.
pp. 7588–7598. [35] K. O’Shea and R. Nash, “An introduction to convolutional neural
[8] T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A networks,” 2015, arXiv:1511.08458.
survey,” J. Mach. Learn. Res., vol. 20, no. 1, pp. 1997–2017, 2019. [36] A. Fred Agarap, “Deep learning using rectified linear units (ReLU),”
[9] Y. Liu, Y. Sun, B. Xue, M. Zhang, G. G. Yen, and 2018, arXiv:1803.08375.
K. C. Tan, “A survey on evolutionary neural architecture search,” [37] N. Bjorck, C. P. Gomes, B. Selman, and K. Q. Weinberger, “Under-
IEEE Trans. Neural Netw. Learn. Syst., early access, Aug. 6, 2021, doi: standing batch normalization,” in Proc. Adv. Neural Inf. Process. Syst.,
10.1109/TNNLS.2021.3100554. vol. 31, 2018, pp. 1–12.
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.
LI et al.: DEVELOPMENT OF DEEP RESIDUAL NEURAL NETWORKS FOR GEAR PITTING FAULT DIAGNOSIS 2521715
[38] W. J. Maddox, M. Balandat, A. G. Wilson, and E. Bakshy, “Bayesian Renxiang Chen received the B.E. and Ph.D. degrees
optimization with high-dimensional outputs,” in Proc. Adv. Neural Inf. in mechanical engineering from Chongqing Uni-
Process. Syst., vol. 34, 2021, pp. 1–14. versity, Chongqing, China, in 2007 and 2012,
[39] R. Astudillo and P. I. Frazier, “Thinking inside the box: A tutorial on respectively.
grey-box Bayesian optimization,” 2022, arXiv:2201.00272. His main research interests include mechanical and
[40] Y. Pan, X. Zeng, H. Xu, Y. Sun, D. Wang, and J. Wu, “Evaluation of electrical equipment security service, life prediction
Gaussian process regression kernel functions for improving groundwater and health management, mechanical signal process-
prediction,” J. Hydrol., vol. 603, Dec. 2021, Art. no. 126960. ing, intelligent manufacturing, machinery condition
[41] H. Wang, B. van Stein, M. Emmerich, and T. Back, “A new acquisition monitoring, and fault diagnosis.
function for Bayesian optimization based on the moment-generating
function,” in Proc. IEEE Int. Conf. Syst., Man, Cybern. (SMC),
Oct. 2017, pp. 507–512.
Xianzhen Huang is currently a Full Professor with
[42] L. Fröhlich, E. Klenske, J. Vinogradska, C. Daniel, and M. Zeilinger,
the School of Mechanical Engineering and Automa-
“Noisy-input entropy search for efficient robust Bayesian optimization,”
tion, Northeastern University, Shenyang, China.
in Proc. Int. Conf. Artif. Intell. Statist., 2020, pp. 2262–2272.
He has published more than 60 articles. His main
research interests include mechanical reliability, sys-
tem reliability, and mechanical dynamics.
Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on June 06,2023 at 15:46:59 UTC from IEEE Xplore. Restrictions apply.