Professional Documents
Culture Documents
Multi-Wavelet Coefficients Fusion in Deep Residual Networks For Fault Diagnosis
Multi-Wavelet Coefficients Fusion in Deep Residual Networks For Fault Diagnosis
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2018.2866050, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2018.2866050, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
wavelet transform, which yielded higher accuracies than from time-frequency representations obtained by diverse
traditional shallow classifiers. Therefore, the feature learning wavelets and verify the effectiveness of the method to fuse
ability of deep learning algorithms is a great benefit for fault multiple wavelet transforms into deep learning algorithms.
diagnosis applications. However, training deep learning That is, this paper aims to learn a good set of features from
algorithms is often not an easy task. For example, traditional diverse wavelets. Furthermore, learning diverse features is vital
deep auto-encoders involve too many weights to be trained, and for increasing the performance of deep learning methods [23],
although convolutional neural networks (CNNs) adopt a weight and the fusion of multi-wavelet coefficients into deep learning
sharing strategy to reduce the number of weights, algorithms can enable learning diverse features. Therefore, this
back-propagating errors through multiple layers can result in study can avoid the wavelet selection problem.
exploding/vanishing gradient problems, which can be a A deep residual network (DRN) [24],[25] is a state-of-the-art
primary cause of training failures. Additionally, most of the deep learning method. DRNs can automatically learn
previous deep learning algorithms were not specifically discriminative features from the input data. The difference
designed for vibration-based fault diagnosis. Hence, to further between a DRN and the classical CNN is that the DRN uses
improve diagnostic performance, it is necessary to explore new identity skip-connections (ISCs) in the deep architecture to
deep learning architectures. make the trainable parameters easier to be optimized. The DRN
Time-frequency analysis can uncover the dynamic properties integrates many “tricks” for a better training of deep neural
of non-stationary vibration signals. Various time-frequency networks, such as momentum, batch normalization (BN), L2
analysis methods (e.g., short-time Fourier transform, wavelet regularization, and variance-scaling weight initialization.
transform, and empirical mode decomposition) have been used These tricks make it reliable and applicable to the experimental
in vibration-based fault diagnosis. However, short-time Fourier data with different properties. Therefore, the DRN has the
transform only has a fixed resolution in the frequency domain, potential to learn a good set of features from the input data and
and empirical mode decomposition lacks a theoretical to correctly identify the health status of an object machine.
demonstration. Due to the merit of multi-resolution localization In this study, two multi-wavelet coefficients fusion methods
in detecting faults from gearboxes [16], wavelet transform [17] within the DRN architecture were developed for fault
was used in this study to generate time-frequency diagnosis—multi-wavelet coefficients fusion in a DRN by
representations of vibration signal as the input of the deep concatenation (MWCF-DRN-C) and multi-wavelet coefficients
learning method. To be specific, discrete wavelet packet fusion in a DRN by maximization (MWCF-DRN-M),
transform (DWPT) was adopted, which can generate a series of respectively. The MWCF-DRN-C method concatenates a series
matrices of wavelet packet coefficients. of matrices of wavelet packet coefficients and takes a single
However, there is still no general consensus as to which concatenated matrix as an input, whereas the MWCF-DRN-M
wavelet can offer optimal performance for fault diagnosis. method re-designs the architecture of the DRN for the sake of
Furthermore, an optimal wavelet basis function can be varied multi-wavelet coefficients fusion. These methods can
depending on fault detection and diagnosis applications [12], adaptively adjust the contribution of wavelet packet
[13], [18]-[20]. For example, Ding and He [12] employed a coefficients to fault diagnosis, with the goal of improving
deep learning model for fault diagnosis of spindle bearings, diagnostic performance. Likewise, these methods can learn
which took a matrix containing wavelet packet energies as better features for fault diagnosis than the state-of-the-art deep
input. Specifically, the authors used a Daubechies 8 wavelet for learning methods (i.e., CNN and DRN) taking a matrix of
DWPT. Wang et al. [13] used a Morlet wavelet to generate wavelet packet coefficients obtained from a certain wavelet
time-frequency representations from the vibration signals and basis function.
further applied a CNN to fault diagnosis, which took the The reminder of this paper is organized as follows. Section II
time-frequency representations as input. Kang et al. [18] used describes a simulation system to collect vibration signals under
DWPT with a Daubechies 20 wavelet to decompose acoustic variable operating conditions used for planetary gearbox fault
emission signals into a series of wavelet packet nodes and diagnosis. Section III delineates the inclusion of domain
extract features from the nodes, such as relative wavelet packet knowledge into the deep models—that is, MWCF-DRN-C and
node energies, for bearing fault diagnosis. MWCF-DRN-M—and defines the input of the methods using
To address the aforementioned issue, wavelet selection [21], multi-wavelet coefficients fusion. In Section IV, performance
[22] and multi-wavelet coefficients fusion are two promising comparisons are conducted to verify the effectiveness of the
solutions. Here, the main objective of wavelet selection is to developed methods for fault diagnosis, and the limitations are
find an optimal wavelet for fault diagnosis. For example, discussed. Section V gives conclusions.
Vakharia et al. [21] used a criterion named as “maximum
energy to Shannon entropy ratio” to specify an optimal wavelet II. FAULT DESCRIPTION OF PLANETARY GEARBOXES
for feature extraction. However, there is still a likelihood that To verify the effectiveness of the developed methods, fault
an optimal wavelet specified by wavelet selection methods may diagnosis of a planetary gearbox was considered. A drivetrain
not be effective for time-frequency representations for learning dynamics simulator was used to simulate the faults. The
a discriminative set of features that will be used for planetary simulator was mainly composed of a 3-phase 3 HP motor, a
gearbox fault diagnosis under non-stationary operating 2-stage planetary gearbox with 192:7 gear ratio (including 4
conditions. This motivated us to learn a good set of features
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2018.2866050, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
planet gears in the 1st stage and 3 planet gears in the 2nd stage), a wavelet function provides optimal diagnostic performance,
2-stage parallel gearbox (with 29:100 gear ratio in the 1st stage multi-wavelet coefficients fusion is considered in this study.
and 5:2 gear ratio in the 2nd stage), and a programmable heavy Accordingly, this section mainly discusses the essential idea
duty magnetic brake (with a maximum torque of 65 lb•ft), as behind the two developed methods, MWCF-DRN-C and
shown in Fig. 1. More information about the simulator can be MWCF-DRN-M, by presenting the theoretical background of a
found in [26]. Vibration signals in the vertical direction were DRN and the design of DRN architecture.
collected at 25.6 kHz using an accelerometer, which was
A. Input Data Configuration
mounted at the input side of the planetary gearbox.
As a classical multi-resolution analysis algorithm, DWPT
[17] enables a signal to be decomposed into two sets of wavelet
coefficients, i.e., the approximation coefficients at a
low-frequency band and the detail coefficients at a relatively
high-frequency band. As indicated in Fig. 2, the decomposition
is then repeated recursively not only on the approximation
coefficients but also on the detail coefficients, so that the
information on various frequency bands can be revealed.
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2018.2866050, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
signals. In this study, Daubechies wavelets were used as an where 𝐼𝐼C is the input feature map of a convolutional layer; 𝐾𝐾 is
example in the experiment since they were widely used in a convolutional kernel; 𝑏𝑏 is a bias; 𝑂𝑂C is a channel of the output
vibration-based fault diagnosis [3], [12], [19]. However, it is feature map; 𝑖𝑖, 𝑗𝑗, and 𝑐𝑐 are the indexes of row, column, and
notable that the developed methods are applicable to other channel of the feature map, respectively; and 𝑢𝑢 and 𝑣𝑣 are the
wavelets, such as Symlet, Coiflet, Morlet, and so forth. indexes of row and column of the convolutional kernel,
As depicted in Fig. 3, the wavelet packet coefficients at respectively. Since a convolutional layer can have more than
different frequency bands obtained using a certain wavelet can one convolutional kernels, more than one channel of output
be formed into a 2-dimensional (2D) matrix; then, the 2D feature map can be obtained. In this study, convolutional
matrices derived from different wavelets can be stacked kernels at a 3 × 3 size were used because they not only have a
together as a 3-dimensional (3D) matrix. Likewise, since the higher computational efficiency than larger kernels, but they
depth of DWPT was 6 (i.e., depth = 6) and the length of an can also be large enough to detect local maxima [27].
observation was 4,096 in the experiment, the dimension of a 3D In each training iteration, a mini-batch of observations is
matrix of wavelet packet coefficients in this study was 𝑝𝑝 × 𝑞𝑞 × randomly selected and fed into the DRN. However, the
𝑁𝑁w = (4096/2depth ) × (2depth ) × 𝑁𝑁w = 64 × 64 × 𝑁𝑁w , distributions of learned features in the mini-batches often
where 𝑁𝑁w is the number of selected wavelets. continuously change in the training iterations, which is known
as the internal covariance shift problem [28]. In such a case, the
weights and biases have to be continuously updated to adapt to
the changed distributions. As a result, the training of deep
networks can be challenging. BN [28] is a kind of technique
which is used to address this problem and is expressed by:
1 𝑁𝑁batch
𝜇𝜇 = � 𝑥𝑥𝑠𝑠 (6)
𝑁𝑁batch 𝑠𝑠=1
1 𝑁𝑁batch
𝜎𝜎 2 = � (𝑥𝑥𝑠𝑠 − 𝜇𝜇)2 (7)
𝑁𝑁batch 𝑠𝑠=1
𝑥𝑥𝑠𝑠 − 𝜇𝜇
𝑥𝑥�𝑠𝑠 = (8)
√𝜎𝜎 2 + 𝜖𝜖
𝑦𝑦𝑠𝑠 = 𝛾𝛾𝑥𝑥�𝑠𝑠 + 𝛽𝛽 (9)
Fig. 3. A series of 2D matrices of wavelet packet coefficients obtained where 𝑥𝑥𝑠𝑠 is a feature of the 𝑠𝑠 observation in a mini-batch,
th
using various Daubechies wavelets. 𝑁𝑁batch is the mini-batch size, 𝜖𝜖 is a constant value which is
B. Background Theory of a DRN close to zero, and 𝑦𝑦𝑠𝑠 is the output feature of BN. The input
features are normalized to have a mean of 0 and a standard
A DRN can be interpreted as a model that is a stack of deviation of 1 in (6), (7), and (8), so that the input features are
various components, including a convolutional layer, a series of enforced to have similar distributions; then, 𝛾𝛾 and 𝛽𝛽 are trained
residual building units (RBUs), a BN, a rectifier linear unit to scale and shift the normalized features to desirable
activation function (ReLU), a global average pooling (GAP), distributions. The optimization of 𝛾𝛾 and 𝛽𝛽 is achieved using a
and a fully connected output layer [24], [25]. As shown in Fig. gradient descent algorithm, which is expressed by:
4(a), a RBU can be composed of two BNs, two ReLUs, two 𝜂𝜂 𝜕𝜕𝐸𝐸𝑠𝑠 𝜕𝜕𝑃𝑃𝑃𝑃𝑃𝑃ℎ_𝑟𝑟𝑘𝑘
convolutional layers, and one ISC. A brief architecture of a 𝛾𝛾 ← 𝛾𝛾 − �� (10)
𝑁𝑁batch 𝜕𝜕𝑃𝑃𝑃𝑃𝑃𝑃ℎ_𝑟𝑟𝑘𝑘 𝜕𝜕𝜕𝜕
DRN is shown in Fig. 4(b). 𝑠𝑠 𝑘𝑘
𝜂𝜂 𝜕𝜕𝐸𝐸𝑠𝑠 𝜕𝜕𝑃𝑃𝑃𝑃𝑃𝑃ℎ_𝑏𝑏𝑘𝑘
𝛽𝛽 ← 𝛽𝛽 − �� (11)
𝑁𝑁batch 𝜕𝜕𝑃𝑃𝑃𝑃𝑃𝑃ℎ_𝑏𝑏𝑘𝑘 𝜕𝜕𝜕𝜕
𝑠𝑠 𝑘𝑘
where 𝜂𝜂 is the learning rate, 𝐸𝐸𝑠𝑠 is the error of the 𝑠𝑠 th
observation. 𝑃𝑃𝑃𝑃𝑃𝑃ℎ_𝑟𝑟 and 𝑃𝑃𝑃𝑃𝑃𝑃ℎ_𝑏𝑏 are two collections of
differentiable paths that connect 𝛾𝛾 and 𝛽𝛽 with the error at the
output layer, respectively.
The ReLU is used to achieve nonlinear transformations by
enforcing the negative features to be zero. It is expressed by:
𝑂𝑂R (𝑖𝑖, 𝑗𝑗, 𝑐𝑐) = max{𝐼𝐼R (𝑖𝑖, 𝑗𝑗, 𝑐𝑐), 0} (12)
where 𝐼𝐼R and 𝑂𝑂R are the input and output feature maps of the
Fig. 4. (a) A RBU, (b) a brief architecture of a DRN, in which “Conv 3 × 3”
ReLU, respectively. The derivative of ReLU is expressed by:
refers to a convolutional layer with convolutional kernel size of 3 × 3.
The convolutional layer is used to learn features, in which 𝜕𝜕𝑂𝑂R (𝑖𝑖, 𝑗𝑗, 𝑐𝑐) 1, if 𝐼𝐼R (𝑖𝑖, 𝑗𝑗, 𝑐𝑐) > 0
=� (13)
each convolutional kernel behaves as a trainable feature 𝜕𝜕𝐼𝐼R (𝑖𝑖, 𝑗𝑗, 𝑐𝑐) 0, if 𝐼𝐼R (𝑖𝑖, 𝑗𝑗, 𝑐𝑐) < 0
extractor. Compared with matrix multiplications in the Its derivative is either 1 or 0, which can reduce the risk of
traditional fully connected layers, the use of convolutions in gradient vanishing and exploding compared with the sigmoid
convolutional layers enables reduction of the number of and tanh activation functions.
weights and computational complexity, which is expressed by: The ISCs are the key component that makes a DRN easier to
train than the traditional CNNs. In the training process of
𝑂𝑂C (𝑖𝑖, 𝑗𝑗) = � � � 𝐼𝐼C (𝑖𝑖 − 𝑢𝑢, 𝑗𝑗 − 𝑣𝑣, 𝑐𝑐) ∙ 𝐾𝐾(𝑢𝑢, 𝑣𝑣, 𝑐𝑐) + 𝑏𝑏 (5)
traditional CNNs without ISCs, the gradients of error with
𝑢𝑢 𝑣𝑣 𝑐𝑐
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2018.2866050, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
respect to the weights (and biases) need to be back-propagated collections of differentiable paths that connect the weight and
layer by layer. For example, the gradients on the 𝑙𝑙 th layer are bias with the neurons at the fully connected output layer,
dependent on the weights at the (𝑙𝑙 + 1)th layer. If the weights at respectively. The training procedures can be repeated a certain
the (𝑙𝑙 + 1)th layer are not optimal, the gradients on the 𝑙𝑙 th layer number of times so that the parameters can be optimized. In
cannot be optimal as well. As a result, it is difficult to summary, the parameters that need to be optimized while
effectively train the weights in a CNN with multiple layers. training include 𝛾𝛾 and 𝛽𝛽 in BNs and the weights and biases in
ISCs solve this problem by directly connecting some the convolutional layers and the fully connected output layer.
convolutional layers to deeper layers, so that it can be easy for
C. Design of the Fundamental Architecture for DRNs
the gradients to be back-propagated through a deep network. In
other words, the gradients can be back-propagated into the Deep learning models’ architectures, including depth (i.e.,
layers easier than the traditional CNNs, so that the weights and the number of nonlinear transformation layers) and width (i.e.,
biases can be updated effectively. It has been shown that a DRN the numbers of kernels in the convolutional layers), are key
with tens or hundreds of layers can be easily trained and yield factors that influence the models’ performance, such as the test
higher accuracies than the CNNs without ISCs [25]. accuracy and computation time. Zoph and Le [30] employed
A GAP was applied before the final fully connected output reinforcement learning for network architecture search, which
layer, which is expressed by: was computationally expensive and used 800 graphics
𝑂𝑂G (𝑐𝑐) = average 𝐼𝐼G (𝑖𝑖, 𝑗𝑗, 𝑐𝑐) (14) processing units to train the deep models with different
𝑖𝑖,𝑗𝑗
hyperparameters. Suganuma et al. [31] investigated a method
where 𝐼𝐼G and 𝑂𝑂G are the input and output feature maps of GAP,
using genetic programming to design deep networks. Despite
respectively. The GAP enables the shift variant problem to be
addressed by calculating a global feature from each channel of such studies, neural network architecture optimization is a
the input feature map. In this study, the shift variant problem long-standing issue in the field–that is, there is still no general
means that the fault-related impulses can exist in different consensus as to how deep or wide the network should be.
locations in the observations, and the GAP can ensure that the
DRN learns features which are invariant to the locations. The
output feature maps of the GAP are fed to the fully connected
output layer to pick up the classification results.
The training process of a DRN follows the same principle as
the general neural networks. The training data are propagated
into a DRN and processed while passing through a series of
convolutional layers, BNs, and ReLUs followed by a GAP and
a fully connected output layer. More specifically, at the fully
connected output layer, a softmax function is used to estimate
the possibility of an observation belonging to the classes [29],
which is expressed by:
𝑒𝑒 𝑥𝑥𝑛𝑛 Fig. 5. A typical architecture of a DRN, in which m indicates the number
𝑦𝑦𝑛𝑛 = 𝑁𝑁class , for 𝑛𝑛 = 1, … , 𝑁𝑁class (15) of convolutional kernels, and “/2” is meant to move the convolutional
∑𝑧𝑧=1 𝑒𝑒 𝑥𝑥𝑧𝑧
kernels with a stride of 2.
where 𝑥𝑥𝑛𝑛 is the feature at the 𝑛𝑛th neuron of the output layer, 𝑦𝑦𝑛𝑛 The fundamental architecture of the DRN used in this study
is the output, which can be seen as the estimated possibility of
is pictorially illustrated in Fig. 5. Likewise, the essential idea
an observation belonging to the 𝑛𝑛th class, and 𝑁𝑁class is the total
behind this architecture is described as follows. First, the
number of classes. Then, the cross-entropy error, which
architecture has 19 convolutional layers and 1 fully connected
measures the distance between the true label 𝑡𝑡 and the output 𝑦𝑦,
layer in depth. Note that it is important to contain a sufficient
can be calculated by:
𝑁𝑁class number of nonlinear transformation layers to ensure that the
𝐸𝐸(𝑦𝑦, 𝑡𝑡) = − � 𝑡𝑡𝑛𝑛 ln(𝑦𝑦𝑛𝑛 ) (16) input data can be converted to be discriminative features. In
𝑛𝑛=1
previous studies conducted for vibration- and current-based
where 𝑡𝑡𝑛𝑛 is the true possibility of the observation belonging to
the 𝑛𝑛th class. Note that the partial derivative of cross-entropy fault diagnosis using deep learning, no more than 10 nonlinear
error with respect to the neurons at the fully connected output transformation layers have been used [6]-[15]. Considering the
layer can be expressed by: increased level of nonlinearity of the acquired data, the DRN
𝜕𝜕𝜕𝜕 contains more nonlinear transformation layers, where a
= 𝑦𝑦𝑛𝑛 − 𝑡𝑡𝑛𝑛 (17) nonlinear transformation layer stands for a convolutional layer
𝜕𝜕𝑥𝑥𝑛𝑛
Then, the error is back-propagated through the network to with a nonlinear activation function (i.e., ReLU) in this study.
update the weights and biases, which are expressed by: As mentioned above, DRNs with tens or hundreds of layers can
𝜂𝜂 𝜕𝜕𝐸𝐸𝑠𝑠 𝜕𝜕𝑥𝑥𝑛𝑛 𝜕𝜕𝑁𝑁𝑁𝑁𝑁𝑁_𝑤𝑤𝑛𝑛,𝑘𝑘 be easily trained due to the use of ISCs [25], so that the depth of
𝑤𝑤 ← 𝑤𝑤 − ��� (18) the DRN architecture is in a reasonable range.
𝑁𝑁batch 𝜕𝜕𝑥𝑥𝑛𝑛 𝜕𝜕𝑁𝑁𝑁𝑁𝑁𝑁_𝑤𝑤𝑛𝑛,𝑘𝑘 𝜕𝜕𝜕𝜕
𝑠𝑠 𝑛𝑛 𝑘𝑘 Then, the first convolutional layer (i.e., the layer closest to
𝜂𝜂 𝜕𝜕𝐸𝐸𝑠𝑠 𝜕𝜕𝑥𝑥𝑛𝑛 𝜕𝜕𝑁𝑁𝑁𝑁𝑁𝑁_𝑏𝑏𝑛𝑛,𝑘𝑘
𝑏𝑏 ← 𝑏𝑏 − ��� (19) the input layer) and three convolutional layers in the RBUs,
𝑁𝑁batch 𝜕𝜕𝑥𝑥𝑛𝑛 𝜕𝜕𝑁𝑁𝑁𝑁𝑁𝑁_𝑏𝑏𝑛𝑛,𝑘𝑘 𝜕𝜕𝜕𝜕 which have a stride of 2, are used to reduce the size of the
𝑠𝑠 𝑛𝑛 𝑘𝑘
where 𝑤𝑤 is a weight, 𝑏𝑏 is a bias, and 𝑁𝑁𝑁𝑁𝑁𝑁_𝑤𝑤 and 𝑁𝑁𝑁𝑁𝑁𝑁_𝑏𝑏 are two feature maps. In Fig. 5, 𝑚𝑚 indicates the number of
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2018.2866050, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
convolutional kernels, which is increased to 2𝑚𝑚 and 4𝑚𝑚 in not easily detected. As a result, the fault-related information
deeper layers because a few basic features can be integrated to can be overwhelmed by the other components, which makes the
be many different high-level features. 𝑚𝑚 is set to 4 in this study. fault diagnosis a challenging task.
To further alleviate over-fitting, dropout [32] with a ratio of To deal with the non-stationary vibration signals, DWPT is
50% is applied to the GAP. In other words, half of the neurons employed to decompose the vibration signals into multiple
in the GAP layer were randomly selected and set to be zero in sub-band signals. However, it is generally unknown which
each training iteration, which can be interpreted as a process of sub-band signal contains the most intrinsic information about
adding noise to the network, in order to prevent the DRN from the system’s health conditions (i.e., normal and several faulty
memorizing too much non-discriminative information and conditions). Likewise, because informative sub-band signals
can be varied due to changes in operating conditions (e.g.,
ensure a high generalizability.
rotating speeds), this study combines all the wavelet
D. Multi-wavelet Coefficients Fusion in a DRN coefficients at each terminal node and uses them as input for the
In this subsection, the motivations for developing deep learning methods.
multi-wavelet coefficients fusion methods are described, and In general, different wavelets may be optimal for diagnosing
the architectures of the two developed methods different types of faults under different operating conditions, so
(MWCF-DRN-C and MWCF-DRN-M) are presented. that it is unlikely for a certain wavelet to be the most effective
1) Motivations of Multi-wavelet Coefficients Fusion for diagnosing all types of faults in consideration (e.g., bearing
inner race faults, outer race faults, ball faults, gear surface pits,
and gear root cracks). Therefore, the fusion of multiple
wavelets can improve the performance of a fault diagnosis task
involving the classification of multiple fault types.
2) MWCF-DRN-C
The developed MWCF-DRN-C method is based on a
well-known fact in deep learning–that is, learning diverse
features is critical for increasing performance [23]. The
multi-wavelet coefficients fusion concept can be considered a
promising way to introduce diversity into a DRN. To enable
multi-wavelet coefficients fusion, one of the simplest methods
Fig. 6. (a) Schematic diagram of a rolling element bearing with a crack is to concatenate all 2D matrices of wavelet packet coefficients
on its inner race, in which the arrows denote rotating and moving
directions, (b) an illustration of impulses due to the crack, (c) schematic
and propagate them into the DRN.
diagram of a gear with a broken tooth, and (d) an illustration of the As illustrated in Fig. 7(a), a special design in
waveform generated by the faulty gear. MWCF-DRN-C is the use of a concatenation layer to combine
The faults on bearings and/or gears often produce relatively multiple wavelet packet coefficients by forming a 𝑝𝑝 × 𝑞𝑞 × 𝑁𝑁𝑤𝑤
large amplitudes in the waveform of vibrations signals. For matrix, where 𝑁𝑁𝑤𝑤 is the number of wavelets in consideration
example, for a bearing with a crack on its inner race (see Fig. and 𝑝𝑝 and 𝑞𝑞 are the dimensionality of a 2D matrix of wavelet
6(a)), there will be a sudden change of contact force every time packet coefficients (see Section III.A). Note that the
a rolling ball passes over the crack, and the sudden change of concatenation layer does not have any parameter to be trained.
contact force will create an impulse in the waveform, as Then, with the use of the concatenation layer, the first
indicated in Fig. 6(b). For a bearing with a crack on its outer convolutional layer has more trainable weights that can be used
race, the rolling balls will strike on the crack and lead to for multi-wavelet coefficients fusion. To be specific, each
impulses as well. Similarly, a broken tooth on a gear (see Fig. convolutional kernel in the first convolutional layer of
6(c)) can lead to a large amplitude every time the broken tooth MWCF-DRN-C has 3 × 3 × 𝑁𝑁w weights, while a
meshes with the tooth of another gear, as indicated in Fig. 6(d). convolutional kernel in the first convolutional layer of the DRN
Conventional signal processing-based fault diagnosis without multi-wavelet coefficients fusion only has 3 × 3
methods often rely on the detection of fault-related waveforms. weights, where 3 × 3 indicates that the length and width of a
For example, for a bearing rotating at a constant speed, the convolutional kernel are both 3. This difference is caused by the
fault-related impulses will be generated periodically; if the time nature of convolutional layers, i.e., the number of channels of a
interval between the impulses matches the ball passing convolutional kernel has to be the same with the number of
frequency of the inner race, it is possible to determine whether channels of input feature map. After a supervised training
the bearing has a fault on its inner race. However, for large process, trainable weights and biases of the MWCF-DRN-C
rotating machines with multi-stage gearboxes, the vibration
can be optimized to learn a discriminative set of features for
signals are often composed of multiple components because of
accurate fault diagnosis.
vibrations excited by the meshing of multi-stage gear
3) MWCF-DRN-M
transmissions, rotations of shafts and bearings, or
The development of MWCF-DRN-M is closely related to the
environmental noise. For a rotating machine operating at
working principle of wavelet analysis in fault diagnosis. For
varying rotating speeds, frequencies of these vibration
fault detection of rotating machinery, wavelet analysis often
components can be non-stationary. Moreover, when a fault is at
works as a method to discover fault-related waveforms by
its early stage, the fault-related information in the waveforms is
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2018.2866050, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
generating very positive or negative wavelet packet coefficients. DRNs [24], [25]. The learning rate was initialized to 0.1 and
In other words, compared with the wavelet packet coefficients reduced to 0.01 at the 40th epoch and 0.001 at the 80th epoch.
which are close to zero, very positive or negative wavelet The training was terminated at the 100th epoch, so that the
packet coefficients are more likely to represent fault-related trainable parameters could be updated in large steps at the
waveforms if the wavelet is effective in detecting the beginning and slightly fine-tuned at the end of the training
fault-related waveforms. However, for large rotating process. The coefficient of momentum [5] was set to 0.9, which
machineries with multi-stage gear transmissions, it is often is used to accelerate the training process by making use of the
unavoidable that some unimportant wavelet packet coefficients update in the previous iteration. The mini-batch size was set to
can have large absolute values as well, which is mainly caused 128, which indicated that 128 observations were randomly
by the other vibration components mentioned in section III.D.1.
selected and fed into the deep architecture in each iteration; the
Aiming at the above issues, an individual convolutional layer
training process can be accelerated compared with feeding one
with trainable parameters is applied to each 2D matrix of
observation in each iteration. The weight decay coefficient of
wavelet packet coefficients with the goal of highlighting the
fault-related wavelet packet coefficients, i.e., transforming the L2 regularization was set to 0.0001, which was the same with
important wavelet packet coefficients to be large features. Then, the DRN in [24].
motivated by the fact that it is generally unknown which B. Performance Comparisons
wavelet can be the most effective in detecting the fault-related
In this section, the state-of-the-art deep learning algorithms
waveforms, the developed MWCF-DRN-M method uses a
without multi-wavelet coefficients fusion (i.e., CNN and DRN
maximization layer [33]-[35] to fuse the information from
multiple wavelets (i.e., the output features of the individual taking a matrix of wavelet coefficients using a certain
convolutional layers). To be specific, the element-wise Daubechies wavelet) were used for performance comparisons.
maximum values are taken as the output in the maximization A 10-fold cross-validation [5] was conducted to evaluate the
layer. methods, that is, the dataset was randomly divided into 10
The architecture of the developed MWCF-DRN-M is subsets. In each test, one subset was used as the testing data,
illustrated in Fig. 7(b). The individual convolutional layers and the other nine subsets were put together to be the training
(which are applied to the 2D matrices of wavelet packet data. The tests were repeated 10 times, so that each subset has a
coefficients) and the maximization layer are special designs chance to be the testing data. As a result, 10 accuracies can be
that differentiate the MWCF-DRN-M from the original DRN. obtained for each method, and their average value (i.e., the
The working principle of the special designs is further average accuracy) was used as the metric to evaluate the
explained. Although the maximization layer is parameterless, method. Experimental results of CNN, DRN, MWCF-DRN-C,
the individual convolutional layers can make it to be a trainable and MWCF-DRN-M are given in Tables III-IV. The overall
process, which enable to adjust the values of the features before average accuracies are given in Table V and discussed below.
performing element-wise maximum feature selection. In this 1) Performance comparison with CNN and DRN
way, the developed MWCF-DRN-M can automatically learn As mentioned above, both CNN and DRN took a matrix of
which features to select for the sake of yielding high diagnostic wavelet coefficients using a certain Daubechies wavelet, from
accuracy. This alternative method facilitates the inclusion of DB1 to DB30. To ensure a fair and reliable comparison, the same
physics-based knowledge into the DRN. hyperparameters mentioned above were adopted. As shown in
Table III, the DRN outperformed the CNN no matter which DB
IV. EXPERIMENTAL RESULTS was used. The overall average test accuracy of the DRN with
different DBs was 91.45%, which was 2.89% higher than CNN.
The two developed methods—i.e., MWCF-DRN-C and
For the MWCF-DRN-C and MWCF-DRN-M methods, a
MWCF-DRN-M—were implemented using TensorFlow 1.0.1,
multiple set of matrices of wavelet packet coefficients using
which is a machine learning library open-sourced by Google.
𝑁𝑁w randomly selected Daubechies wavelets, from DB1 to DB30,
Experimental comparisons were made with the classical CNN
where 𝑁𝑁w = 2, 6, 10, 14, 18, 22, 26, and 30 were considered in
and DRN to verify the efficacy of the developed methods.
this study (see Table IV). The reason that the developed
A. Hyperparameters Setup methods did not consider a full factorial design was to reduce
The hyperparameters were set based on the setups in generic computational burden. The same 10-fold cross-validation was
employed in performance evaluation of the developed methods.
Fig. 7. Architectures of (a) MWCF-DRN-C and (b) MWCF-DRN-M, in which a “2D matrix i” refers to a 2-dimensional matrix of wavelet packet
coefficients obtained by using an i-tap Daubechies wavelet, abbreviated as DBi in this study, and N is the number of Daubechies wavelets in
consideration. Both MWCF-DRN-C and MWCF-DRN-M have the same RBUs as the DRN in Fig. 5.
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2018.2866050, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
TABLE III
ACCURACIES OF THE CNN AND DRN USED FOR FAULT DIAGNOSIS OF THE PLANETARY GEARBOX (UNIT: %)
Training accuracy Test accuracy Training accuracy Test accuracy
CNN DRN CNN DRN CNN DRN CNN DRN
DB1 86.74 ± 2.34 94.98 ± 0.54 83.44 ± 3.01 88.41 ± 0.95 DB16 92.54 ± 1.38 97.09 ± 0.71 89.56 ± 1.96 91.97 ± 1.21
DB2 88.54 ± 1.03 96.31 ± 0.52 86.05 ± 1.50 90.57 ± 0.64 DB17 91.82 ± 1.53 97.04 ± 0.83 88.81 ± 1.58 92.03 ± 0.76
DB3 90.86 ± 2.00 96.78 ± 0.54 88.13 ± 1.83 91.21 ± 1.11 DB18 91.38 ± 0.92 96.69 ± 0.98 88.39 ± 1.39 92.18 ± 0.98
DB4 91.44 ± 1.57 96.92 ± 0.80 88.51 ± 1.50 91.26 ± 1.69 DB19 90.65 ± 2.81 96.38 ± 0.94 87.06 ± 3.21 91.37 ± 1.22
Daubechies wavelets
DB5 91.70 ± 0.91 96.93 ± 1.01 89.01 ± 1.05 92.18 ± 0.96 DB20 91.62 ± 0.72 96.66 ± 0.77 88.53 ± 1.06 91.40 ± 0.87
DB6 91.77 ± 1.60 97.15 ± 0.63 89.04 ± 1.72 91.76 ± 0.85 DB21 91.66 ± 1.22 96.16 ± 1.11 88.77 ± 1.53 91.20 ± 1.42
DB7 91.88 ± 0.62 97.03 ± 0.80 89.17 ± 0.99 91.92 ± 0.97 DB22 92.13 ± 0.49 96.56 ± 1.04 89.33 ± 1.05 91.50 ± 0.95
DB8 92.27 ± 1.44 97.32 ± 0.29 89.39 ± 1.78 92.93 ± 0.79 DB23 92.18 ± 1.31 96.22 ± 0.84 89.58 ± 1.42 91.62 ± 1.04
DB9 92.28 ± 1.64 97.28 ± 0.49 89.45 ± 1.54 91.85 ± 0.84 DB24 92.14 ± 0.59 95.78 ± 0.89 89.18 ± 0.95 90.56 ± 1.12
DB10 92.19 ± 0.58 96.78 ± 0.85 89.24 ± 1.16 91.77 ± 1.33 DB25 91.01 ± 1.60 96.69 ± 0.68 88.03 ± 1.65 91.57 ± 0.75
DB11 91.83 ± 1.90 97.07 ± 0.90 88.43 ± 2.02 92.19 ± 1.31 DB26 91.88 ± 0.90 96.11 ± 0.79 89.12 ± 0.92 91.18 ± 0.97
DB12 91.86 ± 0.84 97.01 ± 0.71 89.14 ± 1.06 92.07 ± 1.29 DB27 90.94 ± 2.05 95.96 ± 0.85 88.43 ± 2.16 90.79 ± 1.44
DB13 92.13 ± 1.61 96.88 ± 0.88 89.68 ± 1.47 91.87 ± 1.40 DB28 91.35 ± 0.42 96.36 ± 0.72 88.49 ± 0.92 91.05 ± 1.16
DB14 92.48 ± 1.33 97.05 ± 0.33 89.71 ± 1.66 92.02 ± 0.82 DB29 90.04 ± 4.05 96.31 ± 0.96 86.64 ± 3.69 90.81 ± 0.98
DB15 92.18 ± 1.09 96.71 ± 0.73 89.48 ± 1.52 91.94 ± 1.43 DB30 91.63 ± 0.60 95.42 ± 0.72 88.98 ± 1.08 90.26 ± 1.29
TABLE IV 11(d), it can be observed that the composite fault “CFB” was
ACCURACIES OF THE DEVELOPED MWCF-DRN-C AND MWCF-DRN-M
METHODS WITH RANDOMLY SELECTED DB WAVELETS (UNIT: %)
also basically separable from the other health conditions, which
Training accuracy Test accuracy means that the MWCF-DRN-C and MWCF-DRN-M are able
2
MWCF-DRN-C
97.05 ± 0.69
MWCF-DRN-M
97.00 ± 0.66
MWCF-DRN-C
91.56 ± 1.49
MWCF-DRN-M
92.41 ± 0.83
to distinguish the composite fault from the other health states.
selected Daubechies
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2018.2866050, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2018.2866050, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
[8] Y. Lei, F. Jia, J. Lin, S. Xing, and S. Ding, “An Intelligent Fault Diagnosis 2017, Berlin, Germany, pp. 497–504.
Method Using Unsupervised Feature Learning Towards Mechanical Big [32] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R.
Data,” IEEE Trans. Ind. Electron., vol. 63, no. 5, pp. 3137–3147, 2016. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks
[9] W. Sun, R. Zhao, R. Yan, S. Shao, and X. Chen, “Convolutional from Overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, 2014.
Discriminative Feature Learning for Induction Motor Fault Diagnosis,” [33] I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio,
IEEE Trans. Ind. Informat., vol. 13, no. 3, pp. 1350–1359, 2017. “Maxout Networks,” in Proc. 30th International Conference on Machine
[10] L. Jing, T. Wang, M. Zhao, and P. Wang, “An Adaptive Multi-Sensor Learning, 16-21 June, 2013, Atlanta, GA, USA, pp. 1319–1327.
Data Fusion Method Based on Deep Convolutional Neural Networks for [34] Y. Huang, X. Sun, M. Lu, and M. Xu, “Channel-Max, Channel-Drop and
Fault Diagnosis of Planetary Gearbox,” Sensors, vol. 17, E414, 2017. Stochastic Max-Pooling,” in Proc. IEEE Conf. Comput. Vis. Pattern
[11] F. Wang, H. Jiang, H. Shao, W. Duan, and S. Wu, “An Adaptive Deep Recognit. Workshop, 11-12 June 2015, Boston, USA, pp. 9–17.
Convolutional Neural Network for Rolling Bearing Fault Diagnosis,” [35] Z. Liao and C. Gustavo, “A Deep Convolutional Neural Network Module
Meas. Sci. Technol., vol. 28, pp. 095005, 2017. that Promotes Competition of Multiple-Size Filters,” Pattern Recognit.,
[12] X. Ding and Q. He, “Energy-Fluctuated Multiscale Feature Learning with vol. 71, pp. 94–105, 2017.
Deep ConvNet for Intelligent Spindle Bearing Fault Diagnosis,” IEEE [36] L.J.P. van der Maaten and G.E. Hinton, “Visualizing High-Dimensional
Trans. Instrum. Meas., vol. 66, no. 8, pp. 1926–1935, 2017. Data Using t-SNE,” J. Mach. Learn. Res., vol. 9, pp. 2579–2605, 2008.
[13] P. Wang, Ananya, R. Yan, and R. X. Gao, “Virtualization and Deep
Recognition for System Fault Classification,” J. Manuf. Syst., vol. 44, pp. Minghang Zhao was born in Shandong, China,
310–316, 2017. in June, 1991. He received his B.E. degree in
[14] M. Xia, T. Li, L. Xu, L. Liu, and C. Silva, “Fault Diagnosis for Rotating mechanical engineering from the College of
Machinery Using Multiple Sensors and Convolutional Neural Networks,” Mechanical Engineering, Chongqing University,
IEEE/ASME Trans. Mechatron., vol. 23, pp. 101–110, 2017. Chongqing, China, in June, 2013.
[15] W. Zhang, C. Li, G. Peng, Y. Chen, and Z. Zhang, “A Deep He is currently working toward a Ph.D. degree
Convolutional Neural Network with New Training Methods for Bearing under the supervision of Prof. Baoping Tang in
Fault Diagnosis under Noisy Environment and Different Working Load,” the State Key Laboratory of Mechanical
Mech. Syst. Signal Process., vol. 100, no. 1, pp. 439–453, 2018. Transmission, Chongqing University, Chongqing,
[16] R. Yan, R. Gao, and X. Chen, “Wavelets for Fault Diagnosis of Rotary China. He was previously a visiting research
Machines: A Review with Applications,” Signal Process., vol. 96, pp. 1– scholar in the Center for Advanced Life Cycle Engineering (CALCE),
15, 2014. University of Maryland, College Park, MD, USA, from 2016 to 2017. His
[17] R. Gao and R. Yan, Wavelets. Springer, Boston, MA, USA, 2011. research interests include data-driven fault diagnosis, prognostics and
[18] M. Kang, J. Kim, J. Kim, A. Tan, E. Kim, and B. Choi, “Reliable Fault health management of mechanical and electrical systems.
Diagnosis for Low-Speed Bearings Using Individually Trained Support Myeongsu Kang (M'17) received the B.E. and
Vector Machines with Kernel Discriminative Feature Analysis,” IEEE M.S. degrees in computer engineering and
Trans. Power Electron., vol. 30, pp. 2786–2797, 2015. information technology and the Ph.D. degree in
[19] Y. Wang, G. Xu, L. Liang, and K. Jiang, “Detection of Weak Transient electrical, electronics, and computer engineering
Signals Based on Wavelet Packet Transform and Manifold Learning for from the University of Ulsan, Ulsan, South Korea,
Rolling Element Bearing Fault Diagnosis,” Mech. Syst. Signal Process., in 2008, 2010, and 2015, respectively.
vol. 54–55, pp. 259–276, 2015. He is currently a Research Scientist with the
[20] Y. Qin, “A New Family of Model-Based Impulsive Wavelets and Their Center for Advanced Life Cycle Engineering
Sparse Representation for Rolling Bearing Fault Diagnosis,” IEEE Trans. (CALCE), University of Maryland, College Park,
Ind. Electron., vol. 65, no. 3, pp. 2716–2726, 2018. MD, USA. His current research interests include
[21] V. Vakharia, V. Gupta, and P. Kankar, “Efficient Fault Diagnosis of Ball data-driven anomaly detection, diagnostics, and prognostics of complex
Bearing using ReliefF and Random Forest Classifier,” J. Braz. Soc. Mech. systems, such as automotive, railway transportation, and avionics, for
Sci. & Eng., vol. 39, no. 8, pp. 2969–2982, 2017. which failure would be catastrophic. He has experties in analytics,
[22] D. Vautrin, X. Artusi, M. Lucas, and D. Farina, “A Novel Criterion of machine learning, system modeling, and statistics for prognostics and
Wavelet Packet Best Basis Selection for Signal Classification With health management.
Application to Brain–Computer Interfaces,” IEEE Trans. Biomed. Eng., Baoping Tang received his M.Sc. degree in
vol. 56, no. 11, pp. 2734–2738, 2009. 1996 and Ph.D. degree in 2003 both from the
[23] Y. Chen, X. Jin, J. Feng, and S. Yan, “Training Group Orthogonal Neural College of Mechanical Engineering, Chongqing
Networks with Privileged Information,” in Proc. International Joint University, Chongqing, China.
Conferences on Artificial Intelligence, 19-25 August 2017, Australia. He is currently a professor and Ph.D.
[24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image supervisor in the College of Mechanical
Recognition,” in Proc. 2016 IEEE Conference on Computer Vision and Engineering, Chongqing University, Chongqing,
Pattern Recognition, 27-30 June 2016, Seattle, WA, USA, pp. 770–778. China. His main research interests include
[25] K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep wireless sensor networks, mechanical and
Residual Networks,” in Proc. 14th European Conference on Computer electrical equipment security service and life
Vision, 8-16 October, 2016, Amsterdam, Netherlands, pp. 630–645. prediction, and measurement technology and instruments.
[26] Drivetrain diagnostics simulator, Dr. Tang won a National Scientific and Technological Progress 2nd
http://spectraquest.com/drivetrains/details/dds/ Prize of China in 2004 and a National Invention 2nd Prize of China in
[27] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking 2015. More than 150 papers has been published in his research career.
the Inception Architecture for Computer Vision,” in Proc. 29th IEEE Michael Pecht (S'78-M'83-SM'90-F'92) received
Conf. Comput. Vis. Pattern Recognit., Las Vegas, NA, USA, Jun. 26–Jul. the B.S. degree in acoustics, the M.S. degrees in
1, 2016, pp. 2818–2826. electrical engineering and engineering
[28] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep mechanics, and the Ph.D. degree in engineering
Network Training by Reducing Internal Covariate Shift,” in Proc. 32nd mechanics from the University of Wisconsin at
International Conference on Machine Learning, 7-9 July 2015, Lille, Madison, Madison, WI, USA, in 1982.
France, pp. 448–456. He is the Founder of the Center for Advanced
[29] P. Zhou and J. Austin, “Learning Criteria for Training Neural Network Life Cycle Engineering (CALCE), University of
Classifiers,” Neural Comput. Appl., vol. 7, no. 4, pp. 334–342, 1998. Maryland, College Park, MD, USA, where he is
[30] B. Zoph and Q. V. Le, “Neural Architecture Search with Reinforcement also a Chair Professor.
Learning,” in Proc. International Conference on Learning Dr. Pecht is a Professional Engineer and an American Society of
Representations, 24-26 April, 2017, Toulon, France. Mechanical Engineers (ASME) Fellow. He has received the IEEE
[31] M. Suganuma, S. Shirakawa, and T. Nagao, “A Genetic Programming Undergraduate Teaching Award and the International Microelectronics
Approach to Designing Convolutional Neural Network Architectures,” in Assembly and Packaging Society (IMAPS) William D. Ashman Memorial
Proc. the Genetic and Evolutionary Computation Conference, 15-19 July Achievement Award for his contribution in electronics reliability analysis.
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.