Professional Documents
Culture Documents
Improved Pitch Shifting Data Augmentation For Ship-Radiated Noise Classification
Improved Pitch Shifting Data Augmentation For Ship-Radiated Noise Classification
Applied Acoustics
journal homepage: www.elsevier.com/locate/apacoust
Technical note
a r t i c l e i n f o a b s t r a c t
Article history: The limited amount of ship-radiated noise data causes machine learning models to be prone to overfitting
Received 1 February 2023 in training, and data augmentation methods could improve model generalization performance. The fre-
Received in revised form 24 May 2023 quency stability of harmonic line spectra in ship-radiated noise leads to a lack of sample diversity gen-
Accepted 28 May 2023
erated by the pitch shifting method, which is overcome by the proposed improved method. Nine
Available online 10 June 2023
classification algorithms combining three time-frequency features and three classifiers are implemented
and evaluated on the DeepShip and the ShipsEar datasets. The average accuracy increased by 1.67% on
Keywords:
DeepShip and 2.25% on ShipsEar, using improved pitch shifting and time stretching augmentation meth-
Ship-radiated noise classification
Data augmentation
ods. The constant-Q transformation-convolutional neural networks (CQT-CNN) performs best among the-
Time stretching se nine algorithms. Its accuracy improved from 68.33% to 74.08% on the DeepShip, and the F1 score of it
Pitch shifting improved from 57.92% to 61.45% on the ShipsEar. Data augmentation improves classification perfor-
Machine learning mance for each class of ships in different ways, suggesting that augmentation specific to the class and
Feature extraction state of the ship would improve classification performance further.
Ó 2023 Elsevier Ltd. All rights reserved.
https://doi.org/10.1016/j.apacoust.2023.109468
0003-682X/Ó 2023 Elsevier Ltd. All rights reserved.
X. Yuanchao, C. Zhiming and K. Xiaopeng Applied Acoustics 211 (2023) 109468
Shifting (PS) were applied to environmental sound classification The DeepShip and ShipsEar datasets are employed. The flow of
[30,31]. The effects of four data augmentation methods on environ- the classification algorithm combined with data augmentation is
mental sound classification, including TS, PS, dynamic range com- given in Fig. 1. The raw noise records and the augmented data un-
pression, and adding background noise, were discussed [32]. With dergo feature extraction, slicing, and normalization and are fed in-
the development of deep learning, some data augmentation meth- to the classification model.
ods based on generative adversarial networks (GAN) have gained
attention in recent years. The robustness of CNN for speech recog- (1) Feature extraction. The original noisy recordings and the
nition could be improved after generating the Mel spectrum for audio signal transformed by data augmentation are first
data augmentation by GAN and conditional GAN [33]. Based on downsampled to 8192 Hz, and then the time–frequency fea-
GAN, audio could be generated directly to improve the speech tures are extracted as follows.
emotion recognition performance of convolutional recurrent neu-
ral networks (CRNN) [34]. Underwater acoustic channel modelling STFT: The window size of fast Fourier transform (FFT) is
and transfer learning were used to augment underwater acoustic 8192. The hop length, the number of points between adja-
data [35]. cent STFT columns, is 4096. After obtaining the time-
Data augmentation methods for ship-radiated noise classifica- frequency spectrogram, the former 2048 frequency bins
tion have been studied. A data augmentation method to generate are picked up, and every four adjacent bins are averaged to
spectrograms based on conditional GAN improves underwater obtain 512 bins. The frequency range is 0 to 2048 Hz with
acoustic target recognition performance [36]. The PS and TS meth- a frequency resolution of 4 Hz.
ods and the temporal and frequency masking of the Mel spectrum MEL: The window size of FFT is 8192. The hop length is
were applied to generate data to improve the underwater target 4096. The number of Mel filters is 512. The frequency range
recognition performance of CRNN [37]. PS and TS methods trans- is 0 to 2048 Hz. The frequency axis is linear up to 1 kHz and
form audio data directly in the temporal domain, which is conve- logarithmic above 1 kHz, containing 256 frequency bins re-
nient and widely used; therefore, this paper focuses on the effect spectively (using the default parameters of the ‘‘librosa”
of these two data augmentation methods on ship-radiated noise. module in Python).
The literature [23], while proposing the DeepShip dataset, investi- CQT: The hop length is 4096. The frequency range is 8 to
gates a variety of time-frequency features, including Mel spectro- 2048 Hz with 512 bins. The number of bins per octave is 64.
gram, constant-Q transformation (CQT) features, and several
classifiers, including CNN, SVM, and random forests (RF) classifiers. (2) Slicing and normalization. The hop length used for each
Inspired by this, three time-frequency features and three common- time-frequency feature is 4096, corresponding to a duration
ly used classifiers are selected, and nine classification algorithms of 0.5 s. Every eight frames are sliced into one segment, and
are constructed to investigate the effect of data augmentation 512 frequencies are taken for each frame to obtain a sample
methods on ship-radiated noise classification performance im- of size 8512. The feature input of the classifier is obtained
provement. Data augmentation is carried out for the Mel spectro- by taking the logarithm of the spectrum value for each sam-
gram and CRNN in the literature [37]. Unlike them, this work ple and then scaling it linearly to the interval from 0 to 1. Be-
focuses on the effectiveness of TS and PS methods that undertake cause the ship-radiated noise characteristics are primarily
transformations in the temporal domain, which could be applied low-frequency, the highest frequency of the time-
to general features and classifiers. Preliminary experiments show frequency features is set to 2048 Hz.
that the PS method does not improve the algorithm classification (3) Classification models. The feature sample size is 8512, a
performance. Based on the ship-radiated noise audio characteris- dimension too high for traditional classification models.
tics, we propose an improved PS method (IPS), which enriches Therefore, for the traditional classification models SVM and
the sample diversity and significantly improves the algorithm clas-
sification performance. Moreover, inspired by literature [32], the
effects of different data augmentation methods on the classifica-
tion performance of various types of ships are analyzed. The exper-
imental results show that the augmentation method considering
ship categories and states would further improve classification
performance.
2. Method
2
X. Yuanchao, C. Zhiming and K. Xiaopeng Applied Acoustics 211 (2023) 109468
Fig. 2. The original and IPS-transformed spectrograms of ship #1 of the Cargo class
in the DeepShip. (a) CQT spectrogram of the original audio. (b) CQT spectrogram of
the audio after IPS transformation with u ¼ p=2. (c) STFT spectrogram of the audio
after IPS transformation with u ¼ p. (d) MEL spectrogram of the audio after IPS
transformation with u ¼ 3p=2. The audio recording lasts about 7.6 min and has 915
frames. Each adjacent eight frames are used as one input sample for the classifier,
with a sample size of 8512. The line spectral frequencies of the original audio
remained almost unchanged across sample slices. After IPS transformation, the
frequency variation of the line spectrum across the samples is richer, and the
original harmonic relationship is maintained.
into four practical classes based on vessel size and one background
noise class (see sec. 3 of ref. [22] for more details).
The r k stratified cross-validation (r k CV) is adopted. The
dataset is divided into c sets, S1 Sc , according to ship categories
first. Each category set Si is further randomly divided into k sets,
c
Si1 Sik , according to recordings. Then we get k sets Dj ¼ [ Sij ,
i¼1
j ¼ 1; 2; ; k. Perform k-fold CV: each set takes turns as the test
set and the rest as the training set. Randomly and independently
perform r repetitions of k-fold CV to get r k score estimations.
The 25 CV and 510 CV are used in experiments of DeepShip
and ShipsEar, respectively. As shown in Fig. 1, the augmented
and original data are jointly used to train the classification model,
and only the original data are used for testing. The accuracy and F1
score are adopted as performance metrics.
The SVM and RF are fitted on the training set and then evaluat-
ed on the test set. Training is stopped when there is a drop in the Fig. 3. Comparison of algorithm accuracy before and after data augmentation. The
accuracy of the CNN on the training set while the accuracy of the scatters in the (a), (b), (c), and (d) are the average accuracy obtained from the 25
CV on DeepShip. The scatters in the (e), (f), (g), and (h) are the average accuracies
previous epoch exceeds 0.98.
obtained from the 510 CV on ShipsEar. The scatter above the diagonal indicates an
The performance scores of the nine classification algorithms are increase in algorithm accuracy after data augmentation, while the opposite
reported in the results, and the differences in algorithm perfor- indicates a decrease. The shape and colour of the scatter indicate the results of
mance before and after data augmentation are compared via paired the paired t-test (p-value less than 0.05). Blue dots indicate a significant increase in
t-tests. the accuracy of the algorithm after data augmentation, red triangles indicate a
significant decrease in the accuracy, and green squares indicate no significant
difference in the accuracy before and after augmentation. (For interpretation of the
3. Results and discussion references to colour in this figure legend, the reader is referred to the web version of
this article.)
3.1. DeepShip
The accuracy of each classification algorithm with different data Regarding the average accuracy over the nine classification al-
augmentation methods on the DeepShip is shown in Table 1. The gorithms, instead of improving the classification performance, PS
‘‘none” in the table indicates that no data augmentation is used, reduces the average accuracy by 0.58 compared to 69.3 without
and IPS + TS indicates that both IPS and TS are used. data augmentation (all ‘‘%” signs are omitted below). In contrast,
4
X. Yuanchao, C. Zhiming and K. Xiaopeng Applied Acoustics 211 (2023) 109468
the IPS method proposed in this paper improves the accuracy by performance except for the PS method. The CQT-CNN method
1.18. TS improved the average classification accuracy by 0.49. has the largest increase. The accuracy increases from the original
The average accuracy reaches an optimal 70.98 when IPS and TS 68.33 to the highest 74.08 with IPS + TS augmentation, which ex-
are used together, an improvement of 1.67 compared to no data ceeds all other algorithms.
augmentation. Fig. 3 compares the accuracy of each classification algorithm
Regarding classification features (the accuracy of the three before and after the four data augmentation methods, where
models is averaged for each feature), the PS method reduces the ac- the scatters with different shapes or colours show the results of
curacy for each feature. The IPS + TS is the best, and the IPS is the the paired t-test (p-value < 0.05). Fig. 3(a) shows that the accura-
second. The MEL features are always optimal for all augmentation cy of MEL-RF and STFT-RF is significantly reduced after the PS
methods, indicating that the MEL features have the best discrim- data augmentation. There is no significant difference in the accu-
inability. The CQT feature has the most significant performance im- racy of the other algorithms. Fig. 3(b) shows that after the TS aug-
provement after data augmentation among the three features, but mentation, the accuracy of STFT-CNN is significantly reduced, five
its original accuracy itself is the lowest. It could be said that data algorithms have significantly higher accuracy, and the other three
augmentation compensates for the inferiority of CQT compared have no significant difference. Fig. 3(c) shows that after IPS data
to the other two features. augmentation, the accuracy of STFT-RF is significantly reduced,
In classification models (the accuracy of the three features is av- three algorithms have significantly higher accuracy, and the other
eraged for each model), the PS method reduces the accuracy, ex- five have no significant difference in accuracy. Although there are
cept for slightly improving the performance of the CNN. IPS + TS more ‘‘wins” in Fig. 3(b) than in Fig. 3(c), the paired t-test be-
is optimal for all three classification models. IPS outperforms TS tween TS and IPS (not given in the figure) indicates that IPS gets
for CNN and SVM but not for RF. The performance of CNN is always four ‘‘wins” and one ‘‘loss”. The improvement of CQT by IPS is sig-
the best for all kinds of data augmentation methods, and it increas- nificantly greater than that by TS. IPS also improves MEL-CNN
es the most with the IPS + TS. significantly more than TS. Moreover, regarding the average of
Regarding each classification algorithm, all the data augmenta- the nine algorithms in Table 1, IPS is 0.69 higher than TS. Fig. 3
tion methods have improved the algorithm classification (d) shows that the accuracy of all algorithms is improved after
Table 1
Accuracy (%) of each algorithm with different data augmentations on the DeepShip dataset.
Table 2
F1 score (%) of each algorithm with different data augmentations on the ShipsEar dataset.
5
X. Yuanchao, C. Zhiming and K. Xiaopeng Applied Acoustics 211 (2023) 109468
the IPS + TS data augmentation, with seven of them improving by 344, which further reveals that data augmentation improves
significantly. the distinction between these two types of ships by CQT-CNN.
On the other hand, the number of Tug samples misclassified as Pas-
3.2. ShipsEar senger ship samples decreased by 856. However, the number of
Passenger ship samples misclassified as Tug samples increased
The F1 score of each classification algorithm with different data by 312, which suggests that the improved performance for Tug
augmentation methods on the ShipsEar is shown in Table 2. Note classification shown in Fig. 4 comes at the sacrifice of classification
that the ShipsEar dataset is small and unbalanced, so the 510 for Passenger. The displacement of Cargo and Tanker is generally
CV and F1 scores are employed in experiments. The accuracy is also large, the sailing speed is slow, the sailing state is stable, and the
calculated. A comparison of the accuracy of each classification al- line spectrum of their radiation is stronger and frequency stable.
gorithm before and after data augmentations is shown in Fig. 3. Therefore, IPS and TS transformations are reasonable as the data
The results on ShipsEar are similar to that on DeepShip. The pro- augmentation method for Cargo and Tanker but not for Tug and
posed IPS data augmentation method improves the PS on the Passenger. Further classification performance improvements could
DeepShip and ShipsEar datasets. be achieved by utilizing data augmentation that considers the ship
Regarding the average score over nine algorithms, the proposed categories and the state of sailing in recording accordingly. This
IPS method improves the F1 by 0.35 compared with the PS. The av- idea will be explored further in future work.
erage F1 reaches an optimal 57.00 when IPS and TS are used to-
gether, an improvement of 3.54 compared to no data 3.4. Discussion
augmentation. Besides, the average accuracy increases by 2.25
when IPS and TS are used. The improvement of IPS on ShipsEar is not as significant as on
Regarding classification features, the IPS outperforms the PS for the DeepShip. The reason is that the average duration of recordings
STFT and MEL. The MEL features are always optimal for all aug-
mentation methods, which aligns with the results on DeepShip.
Regarding classification models, the IPS outperforms PS for CNN
and SVM but not for RF. The performance of CNN is always the best
for all kinds of data augmentation methods, and it increases the
most with the IPS.
Regarding each classification algorithm, the F1 of the CQT-CNN
method increases from the original 57.92 to the highest 60.26 with
IPS + TS augmentation, which exceeds all other algorithms.
Fig. 3(e) shows that the accuracy of MEL-SVM is significantly re-
duced after the PS data augmentation. Fig. 3(g) shows that after IPS
data augmentation, five algorithms have significantly higher accu-
racy, and the other four have no significant difference in accuracy.
Fig. 3(d) shows that the accuracy of all algorithms is improved after
the IPS + TS data augmentation, with six improving significantly.
6
X. Yuanchao, C. Zhiming and K. Xiaopeng Applied Acoustics 211 (2023) 109468
is 277.8 s in DeepShip, which is longer than that (68.7 s in specific) [3] Chen J, Han B, Ma X, Zhang J. Underwater Target Recognition Based on Multi-
Decision LOFAR Spectrum Enhancement: A Deep-Learning Approach. Future
in ShipsEar. On the one hand, the proposed IPS increases the pitch
Internet 2021;13(10):pp. https://doi.org/10.3390/fi13100265.
diversity of the samples in a long-duration recording. On the other [4] Chung KW, Sutin A, Sedunov A, Bruno M. DEMON Acoustic Ship Signature
hand, the pitch of samples varies continuously when IPS is adopt- Measurements in an Urban Harbor. Advances in Acoustics and Vibration
ed, so the samples distribute more evenly in feature space than PS. 2011;2011:1–13. https://doi.org/10.1155/2011/952798.
[5] Esmaiel H, Xie D, Qasem ZAH, Sun H, Qi J, Wang J. Multi-Stage Feature
When the recording duration is short, the sample diversity intro- Extraction and Classification for Ship-Radiated Noise. Sensors 2021;22(1):112.
duced by IPS is similar to PS. Nonetheless, the IPS is still recom- https://doi.org/10.3390/s22010112.
mended because it provides a more stable gain than PS, as [6] H. Li, Y. Cheng, W. Dai, and Z. Li. ‘‘A method based on wavelet packets-fractal
and SVM for underwater acoustic signals recognition.” In: 2014 12th
shown in Fig. 3(e and g), especially when the classifier is CNN. International Conference on Signal Processing (ICSP), Hangzhou, Zhejiang,
The results indicate that the TS improves the classification per- China. Oct. 2014. 2169–2173. doi: 10.1109/ICOSP.2014.7015379.
formance. The TS is implemented based on a phase vocoder. This [7] H. Yang, A. Gan, H. Chen, Y. Pan, J. Tang, and J. Li. ‘‘Underwater acoustic
target recognition using SVM ensemble via weighted sample and feature
simplified implementation intends primarily for data augmenta- selection.” In: 2016 13th International Bhurban Conference on Applied
tion purposes. It does not attempt to handle transients and is likely Sciences and Technology (IBCAST). 2016. 522–527. doi: 10.1109/
to produce many audible artefacts. These artefacts are suspected to IBCAST.2016.7429928.
[8] Zhang Q, Da L, Zhang Y, Hu Y. Integrated neural networks based on feature
increase the robustness of the machine learning algorithms. In fusion for underwater target recognition. Appl Acoust 2021;182:108261.
other words, the TS augments the dataset by adding noise to the https://doi.org/10.1016/j.apacoust.2021.108261.
ship radiation signals, a popular data augmentation method in [9] Honghui Y, Junhao L, Meiping S. Underwater acoustic target multi-attribute
correlation perception method based on deep learning. Appl Acoust
speech recognition tasks. The IPS and TS improve sample diversity
2022;190:108644. https://doi.org/10.1016/j.apacoust.2022.108644.
in different ways, which is the theoretical basis for using IPS and TS [10] Ke X, Yuan F, Cheng E. Integrated optimization of underwater acoustic
simultaneously. Future research will further study how the TS pro- ship-radiated noise recognition based on two-dimensional feature fusion.
motes the classification algorithms and whether it can be im- Appl Acoust 2020;159:107057. https://doi.org/10.1016/j.apacoust.2019.
107057.
proved as the PS. [11] Song G, Guo X, Wang W, Ren Q, Li J, Ma L. A machine learning-based
underwater noise classification method. Appl Acoust 2021;184:108333.
https://doi.org/10.1016/j.apacoust.2021.108333.
4. Conclusion [12] Li Y, Jiao S, Geng B. A comparative study of four multi-scale entropies
combined with grey relational degree in classification of ship-radiated noise.
Appl Acoust 2021;176:107865. https://doi.org/10.1016/j.apacoust.2020.
This paper proposes an improved PS data augmentation method 107865.
for ship-radiated noise classification. The IPS promotes the classifi- [13] Li Y, Jiao S, Geng B, Zhou Y. Research on feature extraction of ship-radiated
cation algorithms’ performance on the DeepShip and ShipsEar noise based on multi-scale reverse dispersion entropy. Appl Acoust
2021;173:107737. https://doi.org/10.1016/j.apacoust.2020.107737.
datasets. The average performance score significantly improves [14] Li Y, Jiang X, Tang B, Ning F, Lou Y. Feature extraction methods of ship-radiated
when the IPS and TS are used together. The CQT-CNN has the best noise: From single feature of multi-scale dispersion Lempel-Ziv complexity to
classification performance with IPS + TS augmentation among the mixed double features. Appl Acoust 2022;199:109032. https://doi.org/
10.1016/j.apacoust.2022.109032.
nine classification algorithms. [15] Li Y, Tang B, Yi Y. A novel complexity-based mode feature representation for
Data augmentation improves the classification of various cate- feature extraction of ship-radiated noise using VMD and slope entropy. Appl
gories of ships differently, suggesting that the algorithm perfor- Acoust 2022;196:108899. https://doi.org/10.1016/j.apacoust.2022.108899.
[16] Li Y, Tang B, Jiao S. SO-slope entropy coupled with SVMD: A novel adaptive
mance would be further improved by applying class-conditional feature extraction method for ship-radiated noise. Ocean Eng
or state-conditional data augmentation. 2023;280:114677. https://doi.org/10.1016/j.oceaneng.2023.114677.
The IPS and TS improve sample diversity in different ways. Fu- [17] Domingos LCF, Santos PE, Skelton PSM, Brinkworth RSA, Sammut K. A Survey
of Underwater Acoustic Data Classification Methods Using Deep Learning for
ture research will further study how the TS promotes the classifi-
Shoreline Surveillance. Sensors 2022;22(6):2181. https://doi.org/10.3390/
cation algorithms and whether it can be improved as the PS. s22062181.
[18] Hu G, Wang K, Peng Y, Qiu M, Shi J, Liu L. Deep Learning Methods for
Underwater Target Feature Extraction and Recognition. Comput Intell
CRediT authorship contribution statement Neurosci 2018;2018:1–10. https://doi.org/10.1155/2018/1214301.
[19] Shen S, Yang H, Li J, Xu G, Sheng M. Auditory Inspired Convolutional Neural
Networks for Ship Type Classification with Raw Hydrophone Data. Entropy
Xu Yuanchao: Conceptualization, Methodology, Software, Vali- 2018;20(12):990. https://doi.org/10.3390/e20120990.
dation, Formal analysis, Validation, Visualization, Writing – origi- [20] Yang H, Li J, Shen S, Xu G. A Deep Convolutional Neural Network Inspired by
nal draft. Cai Zhiming: Supervision, Writing – review & editing. Auditory Perception for Underwater Acoustic Target Recognition. Sensors
2019;19(5):1104. https://doi.org/10.3390/s19051104.
Kong Xiaopeng: Data curation, Writing – review & editing. [21] Li J, Yang H. The underwater acoustic target timbre perception and recognition
based on the auditory inspired deep convolutional neural network. Appl
Acoust 2021;182:108210. https://doi.org/10.1016/j.apacoust.2021.108210.
Data availability [22] Santos-Domínguez D, Torres-Guijarro S, Cardenal-López A, Pena-Gimenez A.
ShipsEar: An underwater vessel noise database. Appl Acoust 2016;113:64–9.
https://doi.org/10.1016/j.apacoust.2016.06.008.
The authors do not have permission to share data. [23] Irfan M, Jiangbin Z, Ali S, Iqbal M, Masood Z, Hamid U. DeepShip: An
underwater acoustic benchmark dataset and a separable convolution based
autoencoder for classification. Expert Syst Appl 2021;183:115270. https://doi.
Declaration of Competing Interest org/10.1016/j.eswa.2021.115270.
[24] Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to
document recognition. Proc IEEE 1998;86(11):2278–324. https://doi.org/
The authors declare that they have no known competing finan-
10.1109/5.726791.
cial interests or personal relationships that could have appeared [25] Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep
to influence the work reported in this paper. Convolutional Neural Networks. Commun ACM 2017;60(6):84–90. https://doi.
org/10.1145/3065386.
[26] N. Jaitly and G. E. Hinton, ‘‘Vocal Tract Length Perturbation (VTLP) improves
References speech recognition,” in International Conference on Machine Learning (ICML).
2013.
[27] Xiaodong Cui, Goel V, Kingsbury B. Data Augmentation for Deep Neural
[1] Zhang L, Wu D, Han X, Zhu Z. Feature Extraction of Underwater Target Signal
Network Acoustic Modeling. IEEE/ACM Trans Audio Speech Lang Process
Using Mel Frequency Cepstrum Coefficients Based on Acoustic Vector Sensor.
2015;23(9):1469–77. https://doi.org/10.1109/TASLP.2015.2438544.
Journal of Sensors 2016;2016:1–11. https://doi.org/10.1155/2016/7864213.
[28] Y. Han and K. Lee. ‘‘Acoustic scene classification using convolutional neural
[2] Azimi-Sadjadi MR, Yao D, Huang Q, Dobeck GJ. Underwater target classification
network and multiple-width frequency-delta data augmentation.” ArXiv, vol.
using wavelet packets and neural networks. IEEE Trans Neural Netw 2000;11
abs/1607.02383, 2016.
(3):784–94. https://doi.org/10.1109/72.846748.
7
X. Yuanchao, C. Zhiming and K. Xiaopeng Applied Acoustics 211 (2023) 109468
[29] B. McFee, E. J. Humphrey, and J. P. Bello, ‘‘A software framework for musical [34] N. T. Pham, D. N. M. Dang, and S. D. Nguyen. ‘‘Hybrid Data Augmentation and
data augmentation,” in Proceedings of the 16th International Society for Music Deep Attention-based Dilated Convolutional-Recurrent Neural Networks for
Information Retrieval Conference, ISMIR 2015, 2015, pp. 248–254. Speech Emotion Recognition.” arXiv, Sep. 18, 2021. Accessed: Jul. 07, 2022.
[30] K. J. Piczak. ‘‘Environmental sound classification with convolutional neural [Online]. Available: http://arxiv.org/abs/2109.09026.
networks,” in 2015 IEEE 25th International Workshop on Machine Learning for [35] Li D, Liu F, Shen T, Chen L, Zhao D. Data augmentation method for underwater
Signal Processing (MLSP), Boston, MA, USA, Sep. 2015, pp. 1–6. doi: 10.1109/ acoustic target recognition based on underwater acoustic channel modeling
MLSP.2015.7324337. and transfer learning. Appl Acoust 2023;208:109344. https://doi.org/10.1016/
[31] Mushtaq Z, Su S-F. Environmental sound classification using a regularized j.apacoust.2023.109344.
deep convolutional neural network with data augmentation. Appl Acoust Oct. [36] Luo X, Zhang M, Liu T, Huang M, Xu X. An Underwater Acoustic Target
2020;167:107389. https://doi.org/10.1016/j.apacoust.2020.107389. Recognition Method Based on Spectrograms with Different Resolutions. JMSE
[32] Salamon J, Bello JP. Deep Convolutional Neural Networks and Data 2021;9(11):1246. https://doi.org/10.3390/jmse9111246.
Augmentation for Environmental Sound Classification. IEEE Signal Process [37] Liu F, Shen T, Luo Z, Zhao D, Guo S. Underwater target recognition using
Lett 2017;24(3):279–83. https://doi.org/10.1109/LSP.2017.2657381. convolutional recurrent neural networks with 3-D Mel-spectrogram and data
[33] Qian YY, Hu H, Tan T. Data augmentation using generative adversarial augmentation. Appl Acoust 2021;178:107989. https://doi.org/10.1016/j.
networks for robust speech recognition. Speech Comm 2019;114:1–9. apacoust.2021.107989.
https://doi.org/10.1016/j.specom.2019.08.006.