Cross-Attention-Based Multi-Sensing Signals Fusion For Penetration State Monitoring During Laser Welding of Aluminum Alloy

Knowledge-Based Systems 261 (2023) 110212
Contents lists available at ScienceDirect
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys
Cross-attention-based multi-sensing signals fusion for penetration

state monitoring during laser welding of aluminum alloy
Longchao Cao a,b , Jingchang Li c , Libin Zhang a,b , Shuyang Luo c , Menglei Li c ,
∗
Xufeng Huang c ,
a
School of Mechanical Engineering & Automation, Wuhan Textile University, Wuhan 430074, PR China
b
Hubei Key Laboratory of Digital Textile Equipment, Wuhan Textile University, Wuhan 430200, PR China
c
School of Aerospace Engineering, Huazhong University of Science & Technology, Wuhan 430074, PR China
article info a b s t r a c t
Article history: A precision multi-sensor monitoring strategy is required to meet the challenges posed by increasingly
Received 3 October 2022 complex products and manufacturing processes during laser welding. In this work, an acoustic sensor
Received in revised form 29 November 2022 and a photoelectric sensor were adopted to collect the signals during the laser welding of aluminum
Accepted 15 December 2022
alloy. The dataset was divided into three categories according to the morphologies of the top and
Available online 19 December 2022
back sides. The cross-attention fusion neural network (CAFNet) was proposed to interactively capture
Keywords: photoelectric and acoustic information for effective quality classification without prior time–frequency
Penetration state monitoring analysis and feature learning. Its effectiveness and superiority were compared with the five types
Laser welding of deep learning (DL) based methods. It demonstrates that the proposed CAFNet method achieved a
Cross-attention mean testing accuracy of 99.73% and a standard deviation of 0.37%, which outperforms other compared
Multi-sensing signals fusion
models. At the same time, the proposed CAFNet achieved the highest average testing accuracy of 94.34%
Deep learning
when utilizing limited and imbalanced data, which suggested that the proposed method has stronger
robustness than other methods. This approach is a new paradigm in the monitoring of laser welding
and can be exploited to provide feedback in a closed-loop quality control system.
© 2022 Elsevier B.V. All rights reserved.
1. Introduction to identify and control the penetration state of weld seams in

industrial fields.
Owing to the advantages of low heat input, small heat-affected With the continuous development of intelligent manufactur-
zone, and high welding efficiency, laser welding has extended ing, deep learning (DL)-based monitoring is emerging, which pro-
vides a promising manner to in-situ detect the welding states [6,
applications in the welding of aluminum alloy [1]. However, it
7]. To obtain suitable information of welding states and develop
suffers from several weld-induced imperfections such as poor
reliable DL-based monitoring methods, several signals (includ-
weld appearance and cracks due to the special physical properties
ing the optical [8–13], acoustic [14–16], and thermal [17–19]
of aluminum alloy of high reflectivity, large thermal expansion signals) dynamically generated during laser-material interaction
coefficient as well as low liquid viscosity, [2,3]. Thereinto, the are monitored by various sensors, which contain important in-
high reflectivity of the aluminum alloy decreases the absorption formation related to weld morphology and penetration status.
of laser energy resulting in incomplete penetration of the weld High-speed cameras, industrial cameras, and photodiode sen-
bead. In addition, the surface temperature and roughness which sors are the most frequently used sensors for optical signals. It
have a significant influence on the absorptivity are dynamically found that plasma plume formation and weld pool oscillation
changed during laser welding [4,5]. Therefore, the penetration were significant signal sources during the laser welding process.
state may fluctuate along the same seam, which is intolerable in Therefore, any instability on those occasions would consequently
engineering practice. An intelligent method is urgently needed affect the weld states, thence it is important to monitor the
dynamic behaviors of the plasma plume and weld pool [20]. To
monitor the behavior of vapor plumes during high-speed laser
∗ Correspondence to: School of Aerospace Engineering, Huazhong University
welding (higher than 10 m/min), Xue et al. [21] applied a high-
of Science & Technology, 1037 Luoyu Road, Wuhan 430074, PR China.
speed camera to obtain images of vapor plumes and analyze
E-mail addresses: longchao2017@gmail.com, clc@wtu.edu.cn (L. Cao),
jingchangli@hust.edu.cn (J. Li), zhanglibin0701@126.com (L. Zhang),
the vapor plume characteristics considering the welding quality.
luoshuyang@hust.edu.cn (S. Luo), m202071363@hust.edu.cn (M. Li), In this context, the random forest with the Bagging method
huangxufeng@hust.edu.cn (X. Huang). was adopted to classify the three ejection regimes of the vapor
https://doi.org/10.1016/j.knosys.2022.110212
0950-7051/© 2022 Elsevier B.V. All rights reserved.
L. Cao, J. Li, L. Zhang et al. Knowledge-Based Systems 261 (2023) 110212
plumes. Cai et al. [22] used a high-speed camera to monitor porosity with an accuracy of 95.67%. Li et al. [33] introduced a
the laser welding process, an image adaptive fusion method was multi-feature back propagation neural network (BPNN) to predict
proposed to reduce the interference of the metal vapor plume. A the weld penetration state (i.e., LP, PP, and FP) with an accuracy
two-dimensional convolutional neural network (2D-CNN) model of 97%, which using the extracted features of plasma plume and
was constructed to recognize the penetration state and the ac- spectrum as input.
curacy was improved to 99.86%. While Liu et al. [23] proposed a However, these methods rely strongly on hand-crafted prepro-
hybrid-driven method for vision-based recognition of penetration cessing or time–frequency analysis, which demanding of costly
status in laser welding, which uses a label semantic attention specialized knowledge. Additionally, the feature extraction of dif-
mechanism to guide the 2D-CNN for accurate recognition of four ferent sensors is performed individually rather than utilizing
penetration statuses including lack of penetration (LP), partial information from both sensors jointly, which will lead to the
penetration (PP), full penetration (FP), and excessive penetration loss of some critical high-level features from both sensors and
(EP). Although the high-speed camera is always used to monitor further induce performance deterioration of diagnostic models.
the welding process intuitively, the photodiode sensor with a In recent years, the self-attention mechanism of the Transformer
higher sampling frequency is less expensive. At the same time, model [34] has been proven to be an effective fusion scheme in
the amount of the collected one-dimension data is smaller, which different domains, such as point cloud learning [35], autonomous
vehicles [36], and prognostics and health management [37]. Nev-
is convenient for storage and real-time processing. It may be
ertheless, no related works have adopted the self-attention mech-
more suitable for real-time monitoring and feedback control in
anism to fuse the time-domain signals of the photodiode and
industrial practice. Besides, the microphone was attracted to ac-
microphone sensors for quality monitoring in laser welding. As a
quire the acoustic signal for welding quality assessment. Giovanni
simple, responsible high sampling frequency, low cost, and easy
et al. [24] built a photodiode-based monitoring system to capture
automation monitoring and feedback control system, microphone
the signals during remote laser welding of a copper-to-steel thin
and photodiode sensors are attractive to monitor the optical
foil of 300 µm. The part-to-part gap variations and EP conditions and acoustic signals generated from plasma plumes and molten
were diagnosed according to energy intensity and scatter level. pools. Therefore, this study aims to develop a photoacoustic-
Similarly, Lee et al. [25] developed a DL identification framework based diagnosis approach to provide accurate identification of
to predict three penetration states (i.e., LP, PP, and FP) in laser the welding penetration states. This approach is based on one-
overlap welding, which used cost-effective photodiode signals dimensional CNN (1D-CNN) and a cross-attention mechanism,
and a one-dimensional convolutional neural network (1D-CNN). which can interactively learn the photoacoustic information in
Luo et al. [26] applied the microphone array to localize and the time domain for the improvement of classification accuracy.
track the acoustic signals during the welding processing, while The main contributions of this article are summarized below:
the welding defects were recognized by source localization. Fur- (1) A multi-sensing system consisting of a microphone and a
thermore, some researchers focus on monitoring the penetration photodiode sensor is established and welding experiments are
status using thermal signals. Yusof et al. [16] found it feasible conducted to construct a sizable dataset;
to monitor the penetration status and explored the correlation (2) An innovative cross-attention fusion neural network
between the laser energy and penetration depth. The thermal (CAFNet) is proposed, which could interactively extract the pho-
signal was always captured by the pyrometer or infrared camera. toacoustic features;
Xiao et al. [27] utilized a coaxial infrared pyrometer to capture (3) The CAFNet performs well under imbalanced data or lim-
the temperature of the molten pool. They concluded that there ited training ratio and could work directly on raw temporal
was a linear relationship between the thermal signal and the signals without any time–frequency analysis or pre-denoising
penetration depth. methods.
The above studies monitor the laser welding process through The structure of this paper is as follows: Section 2 introduces
a single sensor or signal, the effectiveness is acceptable, how- the experimental setup and the signals monitoring platform. The
ever, a single sensor is unable to obtain comprehensive infor- acquired data during laser welding were preliminary analyzed. In
mation during laser welding, therefore, the detecting accuracy Section 3, the acquisition and processing of datasets and the pro-
maybe not be enough accordingly. Recent studies have shown posed CAFNet are introduced. In Section 4, the performance of the
that a welding monitoring system utilizing multiple sensors with CNN model for penetration state monitoring of laser welding was
data fusion schemes can yield improved identification results analyzed and compared with other network models, followed by
[20,28–33]. Zhang et al. [28–30] designed a multiple-sensor sys- the conclusion in Section 5.
tem (including a high-speed camera, a photodiode sensor, a spec-
2. Laser welding equipment and monitoring system
trometer, etc.) to get detailed information of the laser weld-
ing process. After the analyses of the multiple sensors, different
2.1. Laser welding equipment and materials
classification models (e.g., stacked sparse autoencoder and deep
belief network) were employed to discriminate defects from the Fig. 1 demonstrates schematic diagrams of the laser welding
extracted features and achieve satisfactory classification perfor- and monitoring system. A fiber laser IPG YLS-30000 with a laser
mance. Shevchik et al. [31] adopted laser back-reflected (LBR) wavelength of 1070 nm was employed in the experiments. The
and acoustic emissions (AE) sensors to monitor the laser welding maximum output laser power is 30 kW. The spot diameter of the
process of aluminum alloy, while the X-ray radiographic imaging focused laser beam is 0.5 mm. The laser beam was transmitted
method was applied to observe the process zone and provided to the welding head through an optical fiber. The welding head
ground truth of welding quality. By fusing the LBR and AE fea- was mounted on a 6-axis KUKA robot. Argon was utilized as the
tures, a temporal 2D-CNN model was built to classify the five shielding gas with a flow rate of 1.5 m3 /h. The 2A12 aluminum
statuses (i.e., conduction welding, stable keyhole, unstable key- alloy workpieces with a size of 100 mm ×50 mm ×3.8 mm were
hole, blowout, and pores) with an average accuracy of 94.6%. Ma utilized.
et al. [32] applied multi-sensing signals including a high-speed The chemical composition of the material is shown in
camera signal and a coherent light signal to recognize and detect Table 1. Before welding, the surface of the workpieces was pol-
the porosity defects in the laser welding of aluminum alloy. Key- ished by a polisher and cleaned with acetone to remove the oxide
hole 3D morphological characteristics were built and the keyhole film that results in metallurgical porosity interference during the
depth signal was captured. A 2D-CNN was developed to detect experiments [38].
2
Fig. 1. Schematic diagrams of laser welding and monitoring system: (a) the laser welding experimental platform, (b) the photodiode and microphone sensor-based
signals acquisition system.
Table 1
The chemical concentration of 2A12 aluminum alloy (weight in %).
Element Cu Mg Mn Fe Si Zn Ti Al
Content (%) 3.92 1.08 0.62 ≤ 0.5 ≤ 0.5 ≤ 0.3 ≤ 0.15 Bal.
2.2. Generated mechanism of light and acoustic signals standard microphone manufactured by GRAS. The model is 46AE
1/2’’ with a response frequency of 3.15∼51.2 kHz. The sensitivity
Understanding the generated mechanism of light and acoustic of the microphone is 50 mV/Pa with a sensor sensitivity range
signals during laser welding can widen the option for moni- from 3.15 Hz to 20 kHz (±2 dB). A data acquisition card NI 9221
toring the laser welding process on a real-time basis. During with a maximum sampling rate of 800 kHz is adopted to acquire
laser welding, the acoustic signal source includes structure-borne the photodiode signals while the data acquisition card NI 9218
acoustic and air-borne acoustic emission signals. The structure- was applied to capture the acoustic signals. The data acquisition
borne acoustic signal is caused by cracks, porosity, phase transfor- cards were connected to the personal computer. The acquisition
mation, and the back-reflected laser. While the air-borne acoustic parameters and schemes can be set by the NI-DAQmax driver and
signal is induced by plasma plume formation, weld pool, and key- LabVIEW software. The sensors were fixed on a magnetic holder
hole oscillation, as well as the gas jet pulsation in the weld zone. installed on the worktable. The distance between sensors and
The structure-borne acoustic signal is collected by a piezoelectric weldment is about 300 mm at an angle of approximately 45f̊rom
sensor which is contacted directly on the surface of a workpiece. the workpiece surface.
Meanwhile, a microphone sensor is used to acquire the air-borne In this work, a synchronous sampling method was developed
acoustic signal, which does not need to be contacted with the to avoid the unsynchronized capturing signals due to the different
surface of the workpiece. Therefore, it is convenient to utilize traveling speeds of light and sound. Once the laser beam is turned
the microphone sensor to monitor the high-temperature laser on, a trigger pulse is produced to start the data acquisition.
Therefore, the start point of the signals of laser-material inter-
welding process. As the laser beam interacts with the materials,
action can be captured by microphone and photodiode with a
visible and near-infrared (NIR) radiates from the laser interaction
time difference due to the difference in the traveling speeds. As
zone such as the metal vapor (plasma or plume), the molten, and
shown in Fig. 2, there is a short period without valid data at the
the base metal due to high temperature.
beginning time of the acoustic signal. When inputting data into
machine learning models, the short period without valid data is
2.3. Photodiode and microphone based signals acquisition system
removed. Then the acoustic signal and photodiode signals can
be synchronous. In multi-sensor fusion, each sensor has its own
Fig. 1(b) demonstrates the built photodiode and microphone
sampling frequency based on its physics and capability. However,
sensor-based signals acquisition system. This system consists of a in terms of the sensors with different sampling frequencies, it
photodiode sensor, a microphone sensor, a DC power supply, one is feasible to resample the signal from low sampling frequency
data acquisition card (DAQ), a personal computer, and a software to high sampling frequency [39,40]. In this paper, the acoustic
system. A C10439-11 photodiode made by Hamamatsu is applied. signals are resampled to the sampling frequency of photodiode
It is an InGaAs-based photodiode with a circle photoactive area of signals using a polyphase antialiasing filter. In this context, the
3 mm in diameter. The available measurement wavelength range acoustic signals have the same sampling points as the photodiode
is 0.5 µm to 1.7 µm and the sensitivity of the peak wavelength is signals.
1550 nm. Since processing parameters are varied greatly during
different experiments, a neutral density filter with 40% light in- 2.4. Processing parameters of the experiment
tensity reduction is placed above the photoactive area to prevent
probe optical signal intensity from exceeding the capacity of the To acquire enough acoustic and light signals, fifteen combi-
photodiode. The acoustic signal is acquired by a CCP-free field nations of processing parameters were intentionally selected to
3
Fig. 2. The diagram of a short period with valid data.
Table 2
The laser welding processing parameters.
No. Laser welding Energy density Linear energy
power (W) speed (mm/s) (J/mm3 ) (J/mm)
1 2800 30 118.90 93.33
2 3500 70 63.69 50.00
3 3500 40 111.46 87.50
4 4000 60 84.93 66.67
5 4000 50 101.91 80.00
6 3200 50 81.53 64.00
7 3200 35 116.47 91.43
8 3500 50 89.17 70.00
9 3500 45 99.08 77.78
10 4000 70 72.79 57.14
11 3200 45 90.59 71.11
12 3200 40 101.91 80.00
13 3500 60 74.31 58.33
14 3500 60 74.31 58.33
15 4000 80 63.69 50.00
produce three penetration states after numerous trials and errors. penetration states can be distinguished by the top and bottom
The length of every weld seam is 100 mm and the defocusing is views. The top surface height measured by laser scanning confo-
zero. The processing parameters are shown in Table 2. The laser cal microscope can be used further confirm the penetration states.
welding experiments were performed at room temperature. Especially, when the underfill occurred, the obvious groove on the
top surface can be observed. And then excessive penetration is
2.5. Preliminary analysis of the welding bead and collected data formed. The depth of the groove is about 1450 µm, as shown
in Fig. 4(a). When the incomplete penetration is formed, the
In some situations, like welding a closed cavity or pipe, it is reinforcement height is about 700 µm, as shown in Fig. 4(b).
not convenient to distinguish the penetration state according to When the penetration is normal, the reinforcement height is
the bottom view. Exploring valuable information through the top about 500 µm, as shown in Fig. 4(c). Therefore, the penetration
morphology or the captured light and acoustic signals to judge status can be judged by the groove or reinforcement size.
the penetration state is effective and promising. The three typical After observing the top and bottom surfaces, the cross-section
penetration states which are incomplete penetration (IP), normal of each weld bead was cut from each workpiece by an electric
full penetration (NFP), and excessive penetration (EP) are judged cutting machine (STDX600). As illustrated in Fig. 5(a), to reduce
by the morphologies of the top and back side of the weld seam. the uncertainty and random errors, three cross-sections were
Morphologies of both sides were taken for each seam. As shown sampled from each weld bead with a certain interval. A total of
in Fig. 3, the excessive penetration is caused by high linear heat 45 metallographic samples were prepared. The penetration and
input which consequently caused a significant level of top surface reinforcement height were measured on an optical microscope
concavity and root sagging with the combined actions of recoil together with measurement software CSM1 after the samples
pressure, surface tension, and hydrodynamic pressure. While the were polished with suitable abrasive paper and diamond paste
reinforcement and incomplete penetration occurred because of a and etched. As illustrated in Fig. 5(b)–(d), the penetration states
lack of heat input, which may lead to serious deformation and can be distinguished from the cross-section. When the pene-
deteriorative mechanical performances [41]. tration depth is smaller than the thickness of the workpiece,
After laser welding, the surface morphologies of each weld it is incomplete penetration, as shown in Fig. 5(b). When the
bead were measured by stereoscope microscope and laser scan- workpiece was penetrated with obvious underfill on the top
ning confocal microscope. Fig. 4 demonstrates the bead mor- surface and serious sagging on the bottom surface, it is excessive
phologies of the three penetration states measured by the stere- penetration, as shown in Fig. 5(d), due to the surface tension
oscope microscope and laser scanning confocal microscope. The and recoil pressure. When the penetration depth is slightly larger
4
Fig. 3. The details of the welding bead and collected data for three typical penetration states. (a) the top and bottom views; (b) photodiode signals; (c) microphone
signals.
Fig. 4. The bead appearances measured by stereoscope microscope and laser scanning confocal microscope: (a) EP, (b) IP, and (c) NFP.
5
Fig. 5. The diagrammatic of the penetration conditions. (a) the diagram of the weld bead and the sampling positions, the red arrows present the observation direction,
(b) IP, (c) NFP, and (d) EP.
than the thickness of the workpiece with no underfill on the top distribution of the raw signal will not be changed after L2 nor-
surface, it is normal full penetration, as shown in Fig. 5(c). malization, which means the information is lossless and suitable
The captured acoustic and light data are labeled by the pene- for further model training and testing. As shown in Fig. 6, the
tration states. The raw photodiode and microphone signals of the distribution of the normalized signal is the same as that of the
three penetration states are different. However, it is difficult to raw signal, and only the amplitude (e.g., upper point a and lower
distinguish the states just by a large amount of data intuitively. b) is reduced by a factor of ∥x∥2 . The L2 normalization can be
Data processing and a deep learning model are needed to classify formulated as,
the penetration states. x x
xnormlization = = √ (1)
∥ x∥ 2 xT x
3. Proposed method where ∥·∥2 is the Euclidean norm (i.e., 2-norm), x is the raw
signal, xnormlization is the normalized signal.
In this section, the preprocessing of the photodiode and acous- To prepare a suitable dataset for the deep learning-based
tic signals is firstly introduced. Then, the overall architecture (DL-based) models, this paper uses a sliding window for data
and build details of the proposed cross-attention fusion neural preprocessing, that is, slicing the data samples with overlap.
network (CAFNet) are described. However, it is necessary to find a suitable length of the sliding
window (i.e., the length of the sample) and shift size to better
classify the different penetration states. For instance, a too-large
3.1. Data preprocessing length of the sliding window will increase the size of the input
layer in DL-based models, which will increase the computational
The state-of-the-art (SOTA) researches [32,42] adopted time– cost of the neural network; while a too-small length of the sliding
frequency analysis methods, including wavelet packet transform window will not cover enough features of the vibration signal,
(WPT) and continuous wavelet transform (CWT), to convert raw which will lead to the generation of confused samples and further
signals into images for the classification of welding defects. deteriorates the classification performance. In this research, each
Nonetheless, converting raw signals from the time domain to the segment has 1024 points with a shifting size of 256, which is
frequency domain may result in the loss of partial information. In easy to implement normalization and training algorithms for the
addition, since time–frequency analysis methods need redundant DL-based models. Accordingly, the welding dataset consists of
parameters (i.e., scale and translation parameters, the number three penetration states is prepared, which include excessive
of the wavelet coefficients, etc.) and computing processes, these penetration (EP), normal full penetration (NFP), and incomplete
methods are time-consuming and demand practical knowledge. penetration (IP). The EP, NFP, and IP consist of 922, 796, and
608 samples, respectively, and the entire dataset contains 2326
In this section, the sliding window with L2 normalization is used
segments in the time domain.
for one-dimensional (1-D) data preprocessing, as shown in Fig. 6.
3.2. Cross-attention fusion neural network (CAFNet)
To speed up the training process and eliminate the effects of
dimensional difference, a robust L2 normalization technology is The proposed CAFNet aimed at utilizing the self-attention
used for data preprocessing, which obtains normalized by the mechanism of the transformer model [34] to interactively cap-
energy (measured by Euclidean norm) in the raw signal. The ture photoelectric and acoustic information for effective quality
6
Fig. 6. Data preprocessing using a sliding window with L2 normalization.
classification without prior preprocessing and feature learning. Then, a non-linear transformation block is used to calculate
Fig. 7 shows the overall architecture of the proposed CAFNet, the output attentive features Aout and P out , which consists of two
which consist of two-branch one-dimensional convolutional neu- linear layers, one layer normalization (i.e., LayerNorm or LN), and
ral networks (1D-CNNs) and one cross-attention (CA) block. The one non-linear activation function. This process can be defined
1D-CNN takes the 1-D signal as input and extracts photoelectric as:
or acoustic features through four convolutional (Conv) blocks. {
Aout = f (CA1 ) + CA1
Each Conv block consists of one 3 × 1 Conv layer with L filters, (4)
one batch normalization (BN) layer, one leaky rectified linear unit P out = f (CA2 ) + CA2
(LReLU), and one max pooling layer with 2 × 1 kernel size. Herein,
where f (·) represents the function of non-linear transformation,
the BN layer is adopted to alleviate the internal covariate shift and
Aout and P out are of the same shape as the corresponding input
accelerate model convergence, while the LReLU with a negative
features. Specifically, we used the Gaussian error linear units
slope of 0.001 is used to activate the normalized feature maps
(GELU) [43] as the activation function rather than LReLU, fol-
from the BN layer.
lowing the SOTA transformer-based models (e.g., GPT-3 [44] and
Different from the basic feature-level fusion models, a CA
BERT [45]). The classifiers using a GELU can be more robust to
block is used for the fusion of intermediate representations be-
noised inputs, which is suitable for the classification of signals
tween two sensors instead of fusing features at the end of two
without any denoising preprocess. The GELU function can be
branches directly. Actually, multi-sensor fusion is performed with
defined by the following formula:
a CA block in each branch after the first two Conv Blocks. In this
way, the feature representations of different sensors are learned 1 √
GELU(x) = x · [1 + erf(x/ 2)] (5)
jointly, which can not only obtain the commonality of photodiode 2
and microphone data but also retain the difference between where x is the input, and erf (·)
photoelectric and acoustic features. For given acoustic features Af ∫ xis the2 Gauss error function, which
can be given by erf (x) = √2π 0 e−t dt.
and photodiode features Pf , the CA block firstly uses embedding
Finally, the attentive features Aout and P out are added to the
blocks with linear projections to obtain Aq , Ak , Av and Pq , Pk , Pv .
output of the second Conv Block in the corresponding branch.
Here, q, k, v represent query, key, and value, respectively. These
The sum features of the two branches will pass through the
tensors can be formulated as,
two Conv Block to generate the final features, then, the final
Aq = Af ⊙ Waq , Ak = Af ⊙ Wak , Av = Af ⊙ Wav
{
features of the two branches are concatenated and fed into the
p
(2) final layer (i.e., fully connected layer with softmax function) for
Pq = Pf ⊙ Wpq , Pk = Pf ⊙ Wk , Pv = Pf ⊙ Wpv
classification.
where W represents the learnable linear weights, ⊙ represents
the operation of element-wise multiplication (i.e., dot-product). 4. Results and discussion
After obtaining Aq , Ak , Av and Pq , Pk , Pv , the attention matrices
CA1 and CA2 are calculated by the cross self-attention function, To ensure the fairness and consistency of the experiments, all
and this process is
( given as)follows: the competing models in our experiments were performed under
⎧ Pq ⊗ ATk the same running environment, which included: Intel Core i7-
⎪
⎪ CA1 = softmax √ ⊙ Av 9700K@3.60 GHz CPU, an Nvidia RTX 2080 GPU, Ubuntu 22.04,
⎪ DAk
Python 3.9.12, CUDA 11.3.1 and PyTorch 1.11.0.
⎨
(3)
( )
Aq ⊗ Pk T
CA = softmax ⊙ P
⎪
⎩ 2 v 4.1. Compared methods
⎪
⎪ √
DPk
where DAk represents the dimension of Ak , DPk represents the To demonstrate the effectiveness and superiority of the pro-
dimension of Pk , ⊗ represents the operation of matrix multipli- posed CAFNet, different DL-based methods are adopted to an-
cation, the softmax function is applied to obtain the normalized alyze the same dataset for comparisons. The main settings and
weights in the interval [0,1]. Take CA1 for example, the soft- parameters of compared methods are described as follows:
max function is adopted to calculate the relative importance (1) 1D-CNN with acoustic signal (1D-CNN-A):
of acoustic features by aggregating the scores corresponding to First, the one-branch 1D-CNN is implemented, which follows
photoelectric features for each attribute of acoustic features. As a a similar architecture as one of the branches in CAFNet, as shown
result, we obtain an attention vector that can be used to highlight in Fig. 8. Specifically, the pre-processed data from the acoustic
more relevant attributes of acoustic features. signals are used to train and test its model.
7
Fig. 7. Network architecture of the proposed CAFNet.
(2) 1D-CNN with photoelectric signal (1D-CNN-P): Actually, the original ResCNN in Ref. [39] is based on multisen-
To compare the effect of different sensors, the 1D-CNN-P is sory fusion and CNN, which uses principal component analysis to
implemented as shown in Fig. 9, which takes the pre-processed convert multisensory signals into red-green-blue images. How-
data from the photodiode signals as input, and has the same ever, this research only considers two sensors (i.e., microphone
architecture as 1D-CNN-A. and photodiode sensors). To further examine the fusion abil-
(3) ResCNN with acoustic signal (ResCNN-A): ity of the proposed CAFNet, the ResCNN-AP is implemented on
Considering the signal-to-image conversion method and im- fusing features at the end of two branches (i.e., ResCNN-A and
proved residual block, the CNN with residual learning (ResCNN) is ResCNN-P) directly, which same as 1D-CNN-AP.
implemented, which achieves SOTA results in mechanical fault di- Due to the difference in model complexity, different opti-
agnosis [46–48]. Particularly, the acoustic signals are converted to mizers with different initial learning rates are used to optimize
gray images with 32×32×1 pixels and used to train and test the the cross-entropy loss function of compared methods. According
model. More details about the ResCNN and the signal-to-image to the grid search tuning, the stochastic gradient descent (SGD)
conversion method can be seen in Ref. [39]. optimizer with the initial learning rate of 3 × 10−2 is adopted to
(4) ResCNN with photoelectric signal (ResCNN-P): optimize the loss function of 1D-CNN-A, 1D-CNN-P, 1D-CNN-AP,
Similarly, the ResCNN-P is implemented to analyze the ef- and the proposed CAFNet, while the adaptive moment estimation
fect of the different sensors, which has an identical skeleton to (Adam) optimizer with the initial learning rate of 3 × 10−3 is
ResCNN-A. adopted to optimize the loss function of ResCNN-A, ResCNN-P and
(5) 1D-CNN with acoustic and photoelectric signal (1D-CNN- ResCNN-AP. In addition, the cosine annealing scheduler is utilized
AP): to decay the learning rate for each batch, as follows,
To show the effect of the cross-attention scheme, the 1D- ( ( ))
CNN-AP is implemented where the CA block is removed from the (ηinitial − ηmin ) 1 + cos TTmax
cur
π
proposed CAFNet, as shown in Fig. 10. The 1D-CNN-AP focus on ηt = ηmin + (6)
fusing features at the end of two branches directly. 2
(6) ResCNN with acoustic and photoelectric signal (ResCNN- where ηt is the current learning rate, ηmin is the minimum learn-
AP): ing rate, ηinitial is the initial learning rate, Tcur is the number of
8
Fig. 8. Network architecture of 1D-CNN-A.
Fig. 9. Network architecture of 1D-CNN-P.
Fig. 10. Network architecture of 1D-CNN-AP.
current epochs, and Tmax is the number of epochs in each restart. test time per sample) is about 1.84 times longer than that
In this paper, the training phase stopped after 200 epochs, while of the 1D-CNN-based method.
Tmax and ηmin is fixed as 10 and 3 × 10−5 , respectively. (2) Compared with the multi-sensor-based method except for
1D-CNN-AP, the single-sensor-based methods (i.e.,
4.2. Classification performance 1D-CNN-A, 1D-CNN-P, ResCNN-A, ResCNN-P) obtain infe-
rior average testing accuracy, because they only utilize sole
Unless otherwise stated, all experiments were performed 10 information from microphone or photodiode sensor and
times (i.e., with different random seeds) to avoid contingency fail to obtain the common information from different sen-
during the test, and the average values were regarded as the sors for better classification. Interestingly, ResCNN-A yields
final results of classification for analysis. Also, the dataset was improvements of 4.59% in terms of mean testing accuracy
randomly split into subsets of training, validation, and testing. compared with the 1D-CNN-AP, which demonstrates that
The ratios of each subset were defined as 80%, 10%, and 10%, the complex and stronger feature extractor can still obtain
which were commonly used. The comparison bar and classifica- superior classification performance even using information
tion performance of different methods are shown in Fig. 11 and from one of the sensors. However, it’s the space and time
Table 3, respectively. Some interesting observations are listed as complexity are larger.
follows: (3) As can be seen from Fig. 11, compared with directly per-
(1) Among the single-sensor-based methods (i.e., 1D-CNN-A, forming feature-level fusion at the end of the model (i.e.,
1D-CNN-P, ResCNN-A, ResCNN-P), the model using acoustic 1D-CNN-AP and ResCNN-AP), introducing the cross-
signal produces better testing accuracy than that using the attention mechanism into the model improves the clas-
photoelectric signal under the same network architecture, sification performance. In the three multi-sensor-based
which confirms that using information from sole sensor methods, the best performance is achieved by the pro-
might result in low classification accuracy. On the other posed CAFNet, with 99.73% ± 0.37% accuracy, while the
hand, the ResCNN-based method produces a more satis- performance of other methods only has an accuracy of
fying classification performance with an average testing 93.39% ± 1.96% and 99.13% ± 0.67%, respectively. Al-
accuracy of more than 90%. However, the signal-to-image though the ResCNN-AP achieves satisfactory mean value
conversion method and residual learning in ResCNN will and standard deviation in terms of testing accuracy, the
increase the space and time complexity of the model. As space complexity of the ResCNN-AP is about 1.89 times
can be seen from Table 3, the space complexity (measured more than that of the proposed CAFNet while the time
by the number of model parameters) of the ResCNN-based complexity is about 1.40 times longer than that of the
method is about 2.83 times more than that of the 1D-CNN- proposed CAFNet. The results showed that the proposed
based method, while the time complexity (measured by the CAFNet can make a tradeoff between high accuracy and
9
Fig. 11. The comparisons result of different methods.
Table 3
Classification performance of different methods.
Method Testing accuracy Testing time per Model
(Mean ± Std %) sample (ms) params (M)
1D-CNN-A 85.36 ± 2.06 0.85 0.06
1D-CNN-P 81.97 ± 3.92 0.87 0.06
Single-sensor-based
ResCNN-A 97.98 ± 1.51 1.55 0.17
ResCNN-P 91.97 ± 1.75 1.60 0.17
1D-CNN-AP 93.39 ± 1.96 1.45 0.12
Multi-sensor-based ResCNN-AP 99.13 ± 0.67 3.14 0.34
CAFNet 99.73 ± 0.37 2.25 0.18
relatively low computational cost. Accordingly, the pro- overlapped together, and unable to be distinguished. After two
posed method using cross-attention fusion can effectively Conv Blocks (Before Cross-attention Block), the observations of
and efficiently handle information from multiple sensors. some data points are able to be distinguished but not clearly
separated. Although some classes are not clearly separated after
Fig. 12 shows the multi-class confusion matrix obtained from Cross-attention Block, the clustering of data points is much better
compared methods on one of the ten trials. The actual labels and data points from IP are almost split. It can be concluded
are given on the y-axis and the predicted labels are shown on that cross-attention can strengthen the feature representations of
the x-axis. The diagonal terms of the confusion matrix denote different sensors. Before the fully connected layer with Softmax,
the correctly classified quality types in terms of category per- the data points are completely split and divided into different
centages, while the off-diagonal terms denote false classification. groups. Hence, the features extracted by the developed CAFNet
Notably, the phenomenon of data imbalance exists in our dataset will be utilized to conveniently classify the labels of the different
(i.e., the sample size of the three types is different, and the quality states.
sample size of IP is 44 and 42 less than EP and NFP, respectively),
which is common in real industry and significantly deteriorates 4.3. Limited training samples
the model performances. As shown in Fig. 12(c), the 1D-CNN-
P achieves the worst multi-class results where the NFP has the In practice, the limited training samples hinder the develop-
highest misclassification rate (24.49% of IP and 16.13% of EP are ment of the DL-based quality monitoring method. Therefore, six
misclassified as NFP). The corresponding confusion matrix of the experiments using different training ratios were performed to
proposed CAFNet is given in Fig. 12(a), which demonstrates that further examine the effectiveness and robustness of the proposed
the EP has the lowest testing accuracy of 98.92% (the 1.08% of CAFNet method. Table 4 presents the testing accuracy with a lim-
EP is misclassified as NFP), while the NFP and IP have a test ited training ratio of different methods. The bold values in Table 4
accuracy of 100%. For the fine-grained classification under the represent not only the proposed method but also the best results.
data imbalance, the proposed CAFNet is significantly better than From the classification results shown in Table 4, the proposed
the compared methods. CAFNet can accurately classify welding penetration states with a
To further show the feature learning of the proposed CAFNet, limited training ratio, with an average testing accuracy of 94.34%,
a stochastic neighbor embedding (t-SNE) [49] is adopted to vi- outperforming all other methods. Particularly, the average test-
sualize the features of different quality states in 2-D spaces. ing accuracies of 1D-CNN-A, 1D-CNN-P using raw single-signal
The goal of the t-SNE algorithm is to perform dimensionality are 73.98% and 73.21%, respectively, inferior to that of the pro-
reduction on high-dimensional data and obtain the similarities posed CAFNet. And the performance of 1D-CNN-AP based on
between data points in low-dimensional space. Fig. 13 shows the feature-level fusion is still inferior to the proposed CAFNet, with
t-SNE representations (where the dimension 1 and 2 of features an average testing accuracy of only 86.14%. Surprisingly, the
are shown on the x-axis and y-axis, respectively) on the testing ResCNN-A provides satisfactory classification performance with
subset obtained at different structures of the proposed CAFNet, an average testing accuracy of 89.84%, which demonstrates that
namely, Input, Before Cross-attention Block, After Cross-attention ResCNN using only one microphone sensor is still compatible
Block, and Before Fully connected with Softmax. As shown in with other multi-sensor-based methods (i.e., ResCNN-AP and 1D-
Fig. 13, the input data points from different states are highly CNN-AP). Moreover, the proposed CAFNet achieve satisfactory
10
Fig. 12. Multi-class confusion matrix of different methods on one trial (random Seed: 56351).
Fig. 13. t-SNE-based features visualization of the proposed CAFNet.
Table 4
Testing accuracy with limited training ratio of different methods.
Method Testing accuracy (Mean ± Std %) Average
10% 20% 30% 40% 50% (Mean %)
1D-CNN-A 67.45 ± 1.04 71.55 ± 1.68 74.28 ± 0.72 77.54 ± 0.93 79.09 ± 1.37 73.98
1D-CNN-P 66.96 ± 1.74 70.92 ± 1.10 74.10 ± 1.06 76.31 ± 1.17 77.76 ± 1.77 73.21
ResCNN-A 79.67 ± 5.97 88.39 ± 1.90 91.50 ± 1.31 94.58 ± 1.24 95.05 ± 1.17 89.84
ResCNN-P 46.27 ± 5.70 58.05 ± 7.23 68.45 ± 8.09 80.80 ± 2.90 86.71 ± 1.55 68.06
1D-CNN-AP 78.77 ± 1.12 84.28 ± 0.64 87.74 ± 1.48 89.62 ± 1.41 90.27 ± 1.70 86.14
ResCNN-AP 76.62 ± 3.19 83.52 ± 2.48 88.88 ± 1.86 93.73 ± 1.15 94.47 ± 1.40 87.44
CAFNet 86.70 ± 0.46 91.67 ± 0.57 96.89 ± 0.41 97.94 ± 0.36 98.49 ± 0.39 94.34
The bold values represent not only the performance of the proposed method but also the best results.
performance with 86.70% ± 0.46% even under a 10% training The visualization of comparative results in different training
ratio, outperforming all other methods. ratios is depicted in Fig. 14. As shown in Fig. 14, the overall
11
Fig. 14. Visualization of comparative results in different training ratios.
performance improves in all tested methods with an increase showed that the proposed CAFNet takes only 2.25 ms to classify
in the training ratio, which means the DL-based methods are one sample with 1024 data points, which further demonstrates
sensitive to the number of training samples and will provide the proposed method can make a tradeoff between high accuracy
unsatisfactory classification performance with a limited training and relatively low computational cost. Additionally, when utiliz-
ratio. In a word, the proposed CAFNet achieves good performance ing a dataset with limited training samples, the proposed CAFNet
in six experiments and outperforms all other methods. There- achieved the highest average testing accuracy of 94.34%, which
fore, the effectiveness and robustness of the proposed CAFNet shows that the proposed method has stronger robustness than
method with different limited training ratios are validated in this other methods.
section. Although the experimental results demonstrate and validate
the effectiveness of CAFNet when applied to multi-sensor, unbal-
5. Conclusion and the future work anced, and limited labeled samples for welding quality monitor-
ing, there are still some limitations to be explored in future work:
This article presents a novel penetration state monitoring (1) For complex monitoring systems, data come from various
method based on multi-sensing signals and CAFNet during laser sensors and are mostly non-structured, multi-modal and hetero-
welding. A dataset with three penetration states measured by geneous which make the model much more complex. Further
photodiode and microphone sensors is created and used to val- research is required in the future to leverage heterogeneous infor-
idate the effectiveness of the proposed method. The conclusions mation in CAFNet while not reducing the training efficiency; (2)
can be summarized as follows: In engineering scenarios, the welding robot usually works with
(1) Laser welding equipment and a signals monitoring system optimal processing parameters and incomplete or excessive pen-
including photodiode and microphone sensors were built, and etration states are rare. Thus, it is hard to obtain sufficient data
of welding quality from engineering scenarios directly to support
a series of experiments were conducted to collect enough data.
the training of CAFNet. Further study on the application of few-
Three typical penetration states (i.e., IP, NFP, and EP) are judged
shot and zero-shot learning in the data-scarce industry practice
by qualitative and quantitative criteria, including the surface mor-
will be carried out; (3) This paper only considers three typical
phologies of weld bead measured by stereoscope microscope and
penetration states, in the future, more comparative experiments
laser scanning confocal microscope, the cross-section of weld
with state-of-the-art welding monitoring of other defects will be
bead measured by an optical microscope with measurement soft-
investigated.
ware CSM1;
(2) Compared with the SOTA researches of welding moni-
CRediT authorship contribution statement
toring, the proposed method does not require need redundant
parameters and computing processes for data preprocessing. A Longchao Cao: Data Curation, Funding acquisition, Conceptu-
robust 1-D data preprocessing approach was developed to elimi- alization, Writing – original draft. Jingchang Li: Writing – review
nate the effects of dimensional differences and prepare a suitable & editing, Validation. Libin Zhang: Investigation, Funding acqui-
dataset, which is based on the sliding window and L2 normaliza- sition. Shuyang Luo: Visualization. Menglei Li: Formal analysis.
tion; Xufeng Huang: Supervision, Methodology, Software, Writing –
(3) An improved CA block was designed and combined with review & editing.
1D-CNN to construct the CAFNet, which can significantly increase
the accuracy of the CAFNet at the expense of a small amount of Declaration of competing interest
computational cost. With the better interactive ability of feature
extraction and fusion, the proposed CAFNet would improve the The authors declare that they have no known competing finan-
final identification performance on welding quality monitoring; cial interests or personal relationships that could have appeared
(4) The proposed CAFNet was validated to classify the welding to influence the work reported in this paper.
penetration states on a laboratory dataset under data imbalance.
For the full dataset, the proposed CAFNet achieves a mean value Data availability
of 99.73% and a standard deviation of 0.37% in terms of testing
accuracy, which outperforms other compared models. The results Data will be made available on request.
12
Acknowledgments [21] B. Xue, B. Chang, D. Du, Monitoring of high-speed laser welding process
based on vapor plume, Opt. Laser Technol. 147 (2022).
This work was supported by the National Natural Science [22] W. Cai, P. Jiang, L. Shu, S. Geng, Q. Zhou, Real-time laser keyhole welding
penetration state monitoring based on adaptive fusion images using
Foundation of China (NSFC) under Grant No. 52105446, Knowl- convolutional neural networks, J. Intell. Manuf. (2021).
edge Innovation Program of Wuhan-Shuguang Project, PR China [23] T. Liu, J. Bao, H. Zheng, J. Wang, C. Yang, J. Gu, Learning semantic-specific
under Grant No. 2022010801020252, and NSFC under Grant No. visual representation for laser welding penetration status recognition, Sci.
62204178. The authors would like to thank Prof. Ping Jiang and China Technol. Sci. 65 (2022) 347–360.
Shaoning Geng, PhD. for their providing the laser welding equip- [24] G. Chianese, P. Franciosa, J. Nolte, D. Ceglarek, S. Patalano, Characterization
ments. of photodiodes for detection of variations in part-to-part gap and weld
penetration depth during remote laser welding of copper-to-steel battery
tab connectors, J. Manuf. Sci. Eng. 144 (2022).
References [25] K. Lee, S. Kang, M. Kang, S. Yi, C. Kim, Estimation of Al/Cu laser weld
penetration in photodiode signals using deep neural network classification,
[1] N. Kashaev, V. Ventzke, G. Cam, Prospects of laser beam welding and fric- J. Laser Appl. 33 (2021) 042009.
tion stir welding processes for aluminum airframe structural applications, [26] Z. Luo, W. Liu, Z. Wang, S. Ao, Monitoring of laser welding using source
J. Manuf. Process. 36 (2018) 571–600. localization and tracking processing by microphone array, Int. J. Adv.
[2] R.S. Xiao, X.Y. Zhang, Problems and issues in laser beam welding of Manuf. Technol. 86 (2015) 21–28.
aluminum-lithium alloys, J. Manuf. Process. 16 (2014) 166–175. [27] X. Xiao, X. Liu, M. Cheng, L. Song, Towards monitoring laser welding
[3] G. Cam, G. Ipekoglu, Recent developments in joining of aluminum alloys, process via a coaxial pyrometer, J. Mater Process. Technol. 277 (2020).
Int. J. Adv. Manuf. Technol. 91 (2017) 1851–1866. [28] Y. Zhang, D. You, X. Gao, C. Wang, Y. Li, P.P. Gao, Real-time monitoring of
[4] F. Dausinger, P. Berger, H. Hügel, Laser welding of aluminum alloys: high-power disk laser welding statuses based on deep learning framework,
Problems, approaches for improvement and applications, in: International J. Intell. Manuf. 31 (2020) 799–814.
Congress on Applications of Lasers & Electro-Optics, Arizona, USA, 2002, [29] Y. Zhang, D. You, X. Gao, N. Zhang, P.P. Gao, Welding defects detection
287255. based on deep learning with multiple optical sensors during disk laser
[5] Z. Malekshahi Beiranvand, F. Malek Ghaini, H. Naffakh Moosavy, M. Sheikhi, welding of thick plates, J. Manuf. Syst. 51 (2019) 87–94.
M.J. Torkamany, M. Moradi, The relation between magnesium evaporation [30] Y. Zhang, D. You, X. Gao, S. Katayama, Online monitoring of welding
and laser absorption and weld penetration in pulsed laser welding of status based on a DBN model during laser welding, Engineering 5 (2019)
aluminum alloys: Experimental and numerical investigations, Opt. Laser 671–678.
Technol. 128 (2020). [31] S. Shevchik, T. Le-Quang, B. Meylan, F.V. Farahani, M.P. Olbinado, A. Rack,
[6] W. Cai, J. Wang, P. Jiang, L. Cao, G. Mi, Q. Zhou, Application of sensing G. Masinelli, C. Leinenbach, K. Wasmer, Supervised deep learning for real-
techniques and artificial intelligence-based methods to laser welding real- time quality monitoring of laser welding with X-ray radiographic guidance,
time monitoring: A critical review of recent literature, J. Manuf. Syst. 57 Sci. Rep. 10 (2020) 3389.
(2020) 1–18. [32] D. Ma, P. Jiang, L. Shu, S. Geng, Multi-sensing signals diagnosis and CNN-
[7] M.F.M. Yusof, M. Ishak, M.F. Ghazali, Acoustic methods in real-time welding based detection of porosity defect during Al alloys laser welding, J. Manuf.
process monitoring: Application and future potential advancement, J. Syst. 62 (2022) 334–346.
Manuf. Sci. Eng. 15 (2021) 8490–8507. [33] J. Li, Y. Zhang, W. Liu, B. Li, X. Yin, C. Chen, Prediction of penetration based
[8] X. Gao, D. You, S. Katayama, The high frequency characteristics of laser on plasma plume and spectrum characteristics in laser welding, J. Manuf.
reflection and visible light during solid state disk laser welding, Laser Phys. Process. 75 (2022) 593–604.
Lett. 12 (2015). [34] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L.
[9] P. De Bono, C. Allen, G. D’Angelo, A. Cisi, Investigation of optical sensor Kaiser, I. Polosukhin, Attention is all you need, in: Proceedings of the 31st
approaches for real-time monitoring during fibre laser welding, J. Laser International Conference on Neural Information Processing Systems, Long
Appl. 29 (2017). Beach, CA, USA, 2017, pp. 6000–6010.
[10] H.-H. Chu, Z.-Y. Wang, A vision-based system for post-welding quality [35] G. Wang, Q. Zhai, H. Liu, Cross self-attention network for 3D point cloud,
measurement and defect detection, Int. J. Adv. Manuf. Technol. 86 (2016) Knowl.-Based Syst. 247 (2022) 108769.
3007–3014. [36] A. Prakash, K. Chitta, A. Geiger, Multi-modal fusion transformer for end-
[11] Z. Zhang, B. Li, W. Zhang, R. Lu, S. Wada, Y. Zhang, Real-time penetration to-end autonomous driving, in: Proceedings of the IEEE/CVF Conference on
state monitoring using convolutional neural network for laser welding of Computer Vision and Pattern Recognition, Virtual, 2021, pp. 7077–7087.
tailor rolled blanks, J. Manuf. Syst. 54 (2020) 348–360. [37] D. Xiao, C. Qin, J. Ge, P. Xia, Y. Huang, C. Liu, Self-attention-based adaptive
[12] L. Yang, J. Fan, B. Huo, E. Li, Y. Liu, A nondestructive automatic defect remaining useful life prediction for IGBT with Monte Carlo dropout,
detection method with pixelwise segmentation, Knowl. Based. Syst. 242 Knowl.-Based Syst. 239 (2022) 107902.
(2022). [38] L. Cao, Q. Zhou, H. Liu, J. Li, S. Wang, Mechanism investigation of the
[13] X. Dong, C.J. Taylor, T.F. Cootes, Automatic aerospace weld inspection using influence of the magnetic field on the molten pool behavior during laser
unsupervised local deep feature learning, Knowl. Based. Syst. 221 (2021). welding of aluminum alloy, Int. J. Heat Mass Transfer 162 (2020).
[14] W. Huang, R. Kovacevic, A neural network and multiple regression method [39] T. Xie, X. Huang, S.-K. Choi, Intelligent mechanical fault diagnosis using
for the characterization of the depth of weld penetration in laser welding multi-sensor fusion and convolution neural network, IEEE T. Ind. Inform.
based on acoustic signatures, J. Intell. Manuf. 22 (2009) 131–143. 18 (2021) 3213–3223.
[15] S. Lee, S. Ahn, C. Park, Analysis of acoustic emission signals during laser [40] T. Hu, T. Tang, R. Lin, M. Chen, S. Han, J. Wu, A simple data augmentation
spot welding of SS304 stainless steel, J. Mater. Eng. Perform. 23 (2013) algorithm and a self-adaptive convolutional architecture for few-shot fault
700–707. diagnosis under different working conditions, Measurement 156 (2020)
[16] M.F.M. Yusof, M. Ishak, M.F. Ghazali, Feasibility of using acoustic method 107539.
in monitoring the penetration status during the pulse mode laser welding [41] P. Norman, J. Karlsson, A. Kaplan, Monitoring undercut, blowouts and
process, IOP Conf. Ser.: Mater. Sci. Eng. 238 (2017). root sagging during laser beam welding, in: WLT Conference on Lasers
[17] H. Köhler, C. Thomy, F. Vollertsen, Contact-less temperature measurement in Manufacturing, Munich, Germany, 2009.
and control with applications to laser cladding, Weld. World 60 (2015) [42] R. Miao, Z. Shan, Q. Zhou, Y. Wu, L. Ge, J. Zhang, H. Hu, Real-time
1–9. defect identification of narrow overlap welds and application based on
[18] N. Chandrasekhar, M. Vasudevan, A.K. Bhaduri, T. Jayakumar, Intelligent convolutional neural networks, J. Manuf. Syst. 62 (2022) 800–810.
modeling for estimating weld bead width and depth of penetration from [43] D. Hendrycks, K. Gimpel, Gaussian error linear units (GELUs), 2016, arXiv
infra-red thermal images of the weld pool, J. Intell. Manuf. 26 (2013) preprint arXiv:1606.08415.
59–71. [44] T. Brown, B. Mann, N. Ryder, M. Subbiah, J.D. Kaplan, P. Dhariwal, A.
[19] Z. Chen, X. Gao, Detection of weld pool width using infrared imaging Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss,
during high-power fiber laser welding of type 304 austenitic stainless steel, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter,
Int. J. Adv. Manuf. Technol. 74 (2014) 1247–1254. C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner,
[20] L. Kong, X. Peng, Y. Chen, P. Wang, M. Xu, Multi-sensor measurement and S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language models are
data fusion technology for manufacturing process monitoring: A literature few-shot learners, in: Advances in Neural Information Processing Systems,
review, Int. J. Extreme Manuf. 2 (2020). Virtual, 2020, pp. 1877–1901.
13
[45] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep [47] S. Luo, X. Huang, Y. Wang, R. Luo, Q. Zhou, Transfer learning based on
bidirectional transformers for language understanding, in: Proceedings improved stacked autoencoder for bearing fault diagnosis, Knowl.-Based
of the 2019 Annual Conference of the North American Chapter of the Syst. 256 (2022) 109846.
Association for Computational Linguistics, Minneapolis, USA, 2019, pp. [48] X. Li, H. Jiang, R. Wang, M. Niu, Rolling bearing fault diagnosis using
4171–4186. optimal ensemble deep transfer network, Knowl.-Based Syst. 213 (2021)
[46] B. Zhao, X. Zhang, H. Li, Z. Yang, Intelligent fault diagnosis of rolling bear- 106695.
ings based on normalized CNN considering data imbalance and variable [49] L. Van der Maaten, G. Hinton, Visualizing data using t-SNE, J. Mach. Learn.
working conditions, Knowl.-Based Syst. (2020) 105971. Res. 9 (2008).
14

Cross-Attention-Based Multi-Sensing Signals Fusion For Penetration State Monitoring During Laser Welding of Aluminum Alloy

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cross-Attention-Based Multi-Sensing Signals Fusion For Penetration State Monitoring During Laser Welding of Aluminum Alloy

Uploaded by

Copyright:

Available Formats

Knowledge-Based Systems 261 (2023) 110212

Contents lists available at ScienceDirect

Cross-attention-based multi-sensing signals fusion for penetration

1. Introduction to identify and control the penetration state of weld seams in

Fig. 2. The diagram of a short period with valid data.

Fig. 6. Data preprocessing using a sliding window with L2 normalization.

Fig. 7. Network architecture of the proposed CAFNet.

Fig. 8. Network architecture of 1D-CNN-A.

Fig. 9. Network architecture of 1D-CNN-P.

Fig. 10. Network architecture of 1D-CNN-AP.

Fig. 11. The comparisons result of different methods.

Fig. 13. t-SNE-based features visualization of the proposed CAFNet.

Fig. 14. Visualization of comparative results in different training ratios.

You might also like