Underwater Acoustic Target Classification Based On Dense Convolutional Neural Network

This article has been accepted for inclusion in a future issue of this journal.
Content is final as presented, with the exception of pagination.
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 1
Underwater Acoustic Target Classification Based

on Dense Convolutional Neural Network
Van-Sang Doan , Thien Huynh-The , Member, IEEE,
and Dong-Seong Kim , Senior Member, IEEE
Abstract— In oceanic remote sensing operations, underwater years, the underwater acoustic (UA) signal classification has
acoustic target recognition is always a difficult and extremely been developed through expertly processing of such hand-
important task of sonar systems, especially in the condition of crafted characteristics as temporal and spectral properties;
complex sound wave propagation characteristics. The expensively
however, their efficiency heavily affects the final performance
learning recognition model for big data analysis is typically an
obstacle for most traditional machine learning (ML) algorithms, of the sonar system. In order to handle this issue, many
whereas the convolutional neural network (CNN), a type of deep feature extraction techniques have been studied for seizing
neural network, can automatically extract features for accurate propeller acoustic characteristics, which are categorized into
classification. In this study, we propose an approach using a dense three groups: time [3], frequency [4], and time–frequency
CNN model for underwater target recognition. The network combination domains [5]. Inspired by human auditory per-
architecture is designed to cleverly reuse all former feature maps ception, the time–frequency analysis techniques, which are
to optimize classification rates under various impaired conditions
more suitable for nonstationary signals, are widely used for
while satisfying low computational cost. In addition, instead of
using time–frequency spectrogram images, the proposed scheme UA signal classification. For example, Zeng and Wang [6]
allows directly utilizing the original audio signal in the time studied the Bark-wavelet analysis and the Hilbert–Huang
domain as the network input data. Based on the experimental transform for the frequency decomposition of UA signal and
results evaluated on the real-world data set of passive sonar, our the signal reconstruction from instantaneous frequency and
classification model achieves the overall accuracy of 98.85% at amplitude, respectively. In the last decade, several machine
0-dB signal-to-noise ratio (SNR) and outperforms traditional ML learning (ML) algorithms of classification have been widely
techniques, as well as other state-of-the-art CNN models.
utilized to improve accuracy. Zhang et al. [7] investigated
Index Terms— Convolutional neural network (CNN), network 19 conventional classifiers, including decision tree (DT), sup-
architecture, sonar system, underwater target recognition. port vector machine (SVM), and k-nearest neighbor (KNN),
for modeling up to 16 UA targets using gammatone fre-
I. I NTRODUCTION quency cepstrum coefficient (GFCC) and further deliver-
ing an accuracy comparison with mel-frequency cepstrum
I N THE underwater environment, vessel propellers, marine
animals (mammals), and other ambient entities emit
acoustic signals with various properties, which become useful
coefficient (MFCC), autoregression (AR), and zero-crossing
(ZC). Despite being better than other feature descriptors,
signatures for recognition. Accordingly, passive sonar systems GFCC takes more expensive time-consuming. To improve
hear the sound to identify and classify surrounding objects the accuracy of UA target classification on a small-scale
for navigation, surveillance, or other important purposes [1]. data set, Ke et al. [8] studied a novel supervised feature-
Up to date, the classification process is mostly carried out separation algorithm to fine-tune the deep features extracted by
by sonar operators who can reidentify the interesting sound 1-D convolutional autoencoder–decoder model. The approach,
whenever emitted by the ship’s propeller [2]. However, this although satisfying recognition rate, reveals some drawbacks,
process can be failed in the impaired conditions of low signal- for instance, the high complexity of the discrete Fourier
to-noise ratio (SNR) and multipath of sound propagation. transform (DFT) for domain transformation and sensitive
Despite being imperative and crucial, human resources training performance under additive noise conditions. Recently, deep
is an expensive time–cost-consuming procedure. For many learning (DL), which has greatly achieved outstanding per-
formance in wide-ranging research areas, from computer
Manuscript received June 29, 2020; revised August 25, 2020; accepted vision to bioinformatics [9], is being attracted for UA signal
October 3, 2020. This work was supported in part by the National Research classification. For example, Wang et al. [10] deployed a
Foundation (NRF), South Korea, through the International Cooperation Pro-
gram under Grant 2019K2A9A1A09081533 and in part by NRF, South Korea,
deep neural network (DNN) to learn the fusion feature of
through the Priority Research Centers Program funded by the Ministry of GFCC and modified empirical mode decomposition (MEMD),
Education, Science and Technology under Grant 2018R1A6A1A03024003. in which the network has a Gaussian mixture model (GMM)
(Corresponding author: Dong-Seong Kim.) layer to improve recognition rate by reducing redundant fea-
The authors are with the ICT Convergence Research Center, Kumoh
National Institute of Technology, Gumi 39177, South Korea, and also tures. In another work [11], a multimodal DL method for
with the Department of IT Convergence, Kumoh National Institute of ship recognition based on ship-radiated sound is investigated,
Technology, Gumi 39177, South Korea (e-mail: doansang.g1@gmail.com; in which the deep features simultaneously extracted from
thienht@kumoh.ac.kr; dskim@kumoh.ac.kr).
Color versions of one or more of the figures in this letter are available
auditory and visual modalities are fused at an intermediate
online at http://ieeexplore.ieee.org. level to promisingly improve sonar systems more accurately
Digital Object Identifier 10.1109/LGRS.2020.3029584 and reliably. Nevertheless, this approach occupies expensively
1545-598X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
Fig. 1. Overall passive sonar system for underwater sound surveillance. Fig. 2. Architecture of UATC-DenseNet for UA target classification. (a) Full
network model and (b) components in a conv-block.
computational cost besides the prerequisite of multimodal
synchronization [12]. of the sonar system. At the beginning of the network, the input
In this work, we propose an efficient approach of UA layer is followed by a batch normalization layer to facilitate
signal classification, wherein a dense convolutional neural the optimization procedure during the training stage. This layer
network (CNN) is cleverly designed to automatically learn- normalizes in value for its input x(i ) by the mean μ and
ing representative features without the expert knowledge of variance σ 2 over a minibatch and each input channel
feature engineering and domain transformation. Concretely, x(i ) − μ
the network architecture with the skip-connection technique x̂(i ) = √ (1)
σ2 +
allows reusing all former feature maps extracted at multiscale
representations, which prevents the network from the gradient where = 10−5 refers to as a constant factor to ensure
vanishing problem caused by stacking many sequential con- numerical stability and avoid division by zero.
volution and activation in a deep network. Besides, the CNN- With regard to the network architecture, UATC-DenseNet
based classifier satisfies the constraint on low complexity is organized by stacking several convolutional blocks, denoted
with an adaptive structure that can reduce the number of conv-block, where each conv-block consists of convolutional,
learnable parameters. Through the performance evaluation of max-pooling, and activation layers, as described in Fig. 2(b).
our real-world data set recorded by a passive sonar system, For detailed configuration, the convolutional layer is assigned
the proposed CNN-based classifier reaches the classification with 32 1-D kernels of size 1 × 7 to generate 32 feature
rate of up to 98.85% at 0-dB SNR and further defeats other maps correspondingly. The product of convolution operation
existing ML and CNN models in an accuracy competition. is formulated by the following equation with the input x̂ and
the convolution coefficients c:

II. P ROPOSED M ETHOD y(i ) = conv(x̂, c) = x̂(i )c(i ) + b (2)
i
For UA signal classification, this research studies a deep
CNN with dense architecture, namely underwater acoustic where b is the scalar bias. The spatial pooling is then adopted
target classification DenseNet (UATC-DenseNet), to recognize for downsampling the output feature map y by removing the
12 classes of UA signal. For details, the data set of UA presence of weak features, wherein a max-pooling layer is
signals is acquired by a passive sonar served as an underwater configured with the pool size of 1 × 3. The output ypool of
surveillance system whose implementation framework with max-pooling layer can be written as follows:
components is accordingly shown in Fig. 1. The surveil- ypool(i ) = max{y(i − 1), y(i ), y(i + 1)}. (3)
lance system consists of two parts, underwater equipment
and ground equipment, which are communicated to each By specifying the stride of (1, 2), the horizontal dimension of
other by a fiber cable. The underwater components include ypool is halved to reduce the computational volume of many
three hydrophones for receiving and converting UA signal to latter layers.
electrical form, one preprocessing unit for transforming the The max-pooling layer is followed by an activation layer in
signal from analog to digital, and one fiber transmitter for the network architecture that typically plays a crucial role in
data conversion from electrical to optical transmission. On the CNNs. The rectified linear unit (ReLU) function is commonly
ground, a fiber receiver performs reconverting data from opti- taken into account in many well-known CNN architectures
cal to local area network (LAN) transmission. Subsequently, due to fast convergence. However, it reveals an obstacle of
the signal data are stored in a multistore device that is equipped information vanishing for the input less than zero, whereas
with three hard drive disks (HDD). The multistore has a switch the exponential linear unit (eLU) function can promisingly
that allows the surveillance system to read and analyze the overcome the problem to train the network more effectively.
real-time signal or the stored data set. The eLU layer performs the identity operation on the positive
Inspired by the network fundamentally introduced for image inputs and the exponential nonlinearity on the negative ones,
classification [13], UATC-DenseNet is leveraged with a com- which is generally given as follows:

pact architecture, as presented in Fig. 2(a), to be compatible x, if x ≥ 0
with the input of 1-D time-series data, such as acoustic signal. h(x) = (4)
α(e x − 1), if x < 0.
As a data preprocessing, the continuous acoustic signal in the
time domain is partitioned into multiple frames with a size of It is noticed that the input of activation function is the
1 × 4096 samples. It is worth noting that data normalization feature map ypool after pooling. Regarding the overall archi-
is optional to avoid the bias caused by different configurations tecture, UATC-DenseNet has a backbone flow and several
DOAN et al.: UA TARGET CLASSIFICATION BASED ON DENSE CNN 3
skip-connections. The flow is stacked by three conv-blocks target detection) is recorded by a passive sonar system at the
for deep feature extraction; meanwhile, skip-connections are sampling rate of 22 050 Hz. All signals are labeled by a sonar
cleverly studied to facilitate the gradient flow throughout the expert who spends many years at work of target detection
network going deeper. Compared with some conventional and reidentification using acoustic-based sonar systems. To
CNNs, the skip-connection allows optimizing the utilization challenge classification models, the signals have borne an
of feature maps extracted in many former conv-blocks and additive synthetic noise with SNR (referred to as the ratio of
also protecting the network against the vanishing gradient the average power of received signal per the power of noise)
problem. Generally, three widely used mechanisms of skip- spreading in the range of [−20, 10] dB with the step size
connection are addition operation, sidewise concatenation, and of 2 dB. As a preprocessing of data manipulation, each signal
depth concatenation. The addition operation is intensively is consecutively segmented to 1000 observation frames, where
studied in the well-known ResNet [14], wherein different each frame has 4096 amplitude samples. Regarding 12 UA
feature maps with the same volume are combined via the signals measured at different SNRs, totally 192 000 signal
elementwise addition. In another mechanism, the sidewise frames are accumulated. Then, we randomly divide the data
concatenation operation joints several feature maps along set into 70% for training and 30% for testing.
either the horizontal or the vertical dimensions. Unlike the
sidewise concatenation that expands the spatial dimension of IV. E XPERIMENTAL R ESULTS AND D ISCUSSION
output, the depth concatenation merges the input feature maps
This section reports the experimental results, evaluated
in the depth dimension. In the comparison of computational
on the 12-class UA data set, to demonstrate the efficiency
complexity, the depth concatenation is more appropriate than
of UATC-DenseNet bearing low-SNR conditions and verify
the sidewise concatenation for deployment in this study. Since
the robustness of the model in the association with various
the skip-connections are structured at different scaling feature
parameter configurations, including the kernel size and the
representations, the max-pooling layers are, therefore, assem-
number of conv-blocks. Furthermore, a comparison between
bled to rescale the spatial dimension of former feature maps
the proposed CNN and other existing DL models for the task
to properly fit the volume size of backbone flow, where the
of UA target classification is given accordingly. With regard to
output is denoted by ymaxpool . Let yblock refers to as the output
the learning classification model, UATC-DenseNet is trained
of each conv-block; hence, the concatenation can be expressed
in 40 epochs using the stochastic gradient descent optimizer
as follows:
with the minibatch size of 64 and the initial learning rate
zconcat = concat{ymaxpool, yblock }. (5) of 0.001. To prevent the network from overfitting, the dropout
technique with the probability of 20% and the L2 regulariza-
According this mechanism, at the last concatenation, the out- tion (weight decay) with the factor of 0.0001 are adopted in
put involves the feature of backbone (i.e., the last conv-block) the training process.
and the normalized information of former conv-blocks. At In the first experiment, the effect of the kernel size para-
the end of network architecture, UATC-DenseNet is finalized meter on the overall accuracy is comprehensively analyzed,
with an output block including an average pooling layer with in which the size of 1-D kernels configured in the convolu-
the pool size of 1 × 8 and the stride of (1, 8), an eLU tional layers varies in the set of {3, 5, 7, 9, 11, 13, 15}. Based
activation layer, a fully connected layer, a softmax layer, on the numerical result plotted in Fig. 3(a), the larger the
and an output layer (also known as classification layer). The kernel size is, the more accurately the model classifies. It
average pooling layer is configured to compound all highly is obvious that, with a larger kernel size corresponding to a
discriminative features instead of only picking up the max bigger receptive field of neuron observation, more meaningful
values. Besides, the fully connected layer is specified with representative features extracted in local enable the network
12 neurons corresponding to 12 target classes for classification. to recognize UA target more explicitly. For example, with the
The output of a fully connected layer is followed by a softmax kernel 1 × 15, the model achieves the greatest classification
layer and a classification layer, where the softmax function is rate of 99.25% at 0-dB SNR, which improves the model
formulated to produce decimal probabilities in the cooperation specifying the kernel 1 × 3 by 2.48%. Compared with the
with the classes of UA targets. Assume that the output feature kernel 1 × 7 configured in Table I, the higher gap yielded by
vector of the fully connected layer is presented by r (x), and the kernel 1 × 15 kernel is insignificant.
the output of softmax function can be written as follows: In the second experiment, the performance of the model is
eri (x) evaluated with various numbers of conv-block to study the
pi (x) = softmax{ri (x)} = r (x) . (6) impact of network depth on the performance of the sonar
je
j
system. Regarding the evaluation result shown in Fig. 3(b),
Finally, UATC-DenseNet predicts the target of an incoming it is realized that UATC-DenseNet properly enhances the clas-
UA signal x regarding the class having the highest probability sification performance with more conv-blocks configured in
a deeper network. By adopting more conv-blocks throughout
Predicted_UAT(x) = arg{max(pi (x))}. (7) the backbone of CNN, the UA signals can be analyzed more
distinctively, wherein many temporal correlations are captured
at multiple feature representations and further combined via
III. DATA S ET D ESCRIPTION the skip-connection mechanism cleverly deployed in UATC-
For performance experiments, a data set involving 11 UA DenseNet. If the network is expanded from two to three
signals (corresponding to 11 targets for recognition) and one conv-blocks, the accuracy is remarkably improved by approx-
noisy blank signal (that means an ambient sound without imately 1.90% of average. With more than three conv-blocks,
4 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
Fig. 3. Classification accuracy comparison with different hyperparameter configurations: (a) different kernel sizes, (b) different number of conv-blocks,
(c) different layer configurations, and (d) different types of input features.
TABLE I
D ETAILED D ESCRIPTION OF UATC-D ENSE N ET A RCHITECTURE
Fig. 4. Accuracy of 12-target classification in detail.
Add + Conv by around 79k learnable parameters (including

weights and biases) and 0.003 giga-multiplier–accumulators
(GMACs). In Fig. 3(d), we investigate the performance of
UATC-DenseNet with different input formats, including raw
waveform and short-time Fourier transform (STFT) feature
(using the window size of 128 without overlapping). The
numerical results indicate that UATC-DenseNet classifies raw
waveforms more accurately than STFT features by the average
accuracy of around 2.1%. It is worth noting that the STFT-
the model gains an accuracy improvement trivially by 0.24%– based approach is computationally expensive in terms of
0.59% on average, but the network complexity, including the feature extraction compared with the raw waveform-based one.
number of trainable parameters and the number of convolution Intensively, the detailed classification results of 11 UA
operations, grows rapidly to cause the system working more targets and one noisy blank signal are reported in Fig. 4,
slowly. Reasonably, UATC-DenseNet is organized by three wherein the nondetected target impaired by additive Gaussian
conv-blocks, wherein the convolutional layer in each block noise is easily recognized by UATC-DenseNet. With other
is configured by 1 × 7 kernels to achieve the good tradeoff remainders, depending on the UA emission characteristic of
between accuracy and computational cost for real-time system observed targets, their accuracy rates are different. Further-
implementation. more, it is realized that the accuracy generally rises along
We further analyze the network architecture, in which with the increment of SNR, where the improvement is con-
the concatenation and pooling layers are substituted by the siderable for low SNR (from −20 to +0 dB) and insignificant
addition and 1 × 1 convolution layers, respectively. The con- for ≥+2 dB SNR.
volution layers are configured with stride (1, 2). In Fig. 3(c), In the last experiment, the proposed network notably out-
we show the classification accuracy of two network variants, performs some popular ML methods, including KNN with
denoted Concat + Pool and Add + Conv, where the network five neighbors, DT with ten compact classification trees, SVM
model of Concat + Pool classifies more precisely than that with Gaussian kernel function and an average margin of 0.02,
of Add + Conv with the average higher accuracy of 0.7%. and random forest (RF), where the comparison results are
Compared with Add + Conv, Catcat + Pool is more favor- plotted in Fig. 5(a). We perform tenfold cross validation with
able to accumulate more informative features at multiscale 14 features extracted by the MFCC technique. Furthermore,
resolutions, but it has a higher computational complexity UATC-DenseNet is also compared with three other well-
because more feature maps are stacked throughout the network known networks, including CNN-extreme learning machine
for processing. Statistically, Concat + Pool is heavier than (ELM) [15], ResNet18 [14], and SqueezeNet [16], in terms of
DOAN et al.: UA TARGET CLASSIFICATION BASED ON DENSE CNN 5
V. C ONCLUSION
Our study has demonstrated the evaluation of UATC-
DenseNet with different hyperparameter configurations for the
UA target classification under various SNRs and compared
with other existing models. Based on an intensive perfor-
mance evaluation, UATC-DenseNet classifies the acoustic tar-
get more accurately with a deeper network architecture and
a larger kernel size configured. In performance comparison,
UATC-DenseNet remarkably outperforms four ML techniques
and other existing CNNs for the task of 12-target classification.
With such performance in terms of classification accuracy and
prediction time, UATC-DenseNet has a promising potential
for applying in underwater surveillance systems in practice
Fig. 5. Accuracy comparison between UATC-DenseNet with (a) other ML to accurately classify acoustic targets. In the future, we will
and (b) existing DL methods. exploit several advanced time–frequency analysis techniques
TABLE II
to improve the accuracy of the DL-based classifier.
M ETHOD C OMPARISON OF C OMPUTATIONAL C OMPLEXITY R EFERENCES
[1] K. Nikolai, Sonar Systems. Rijeka, Croatia: InTech, 2011.
[2] Y. Feng, R. Tao, and Y. Wang, “Modeling and characteristic analysis of
underwater acoustic signal of the accelerating propeller,” Sci. China-Inf.
Sci., vol. 55, no. 2, p. 270–280, Jun. 2011.
[3] B. C. Pinheiro, U. F. Moreno, J. T. B. de Sousa, and O. C. Rodríguez,
“Kernel-function-based models for acoustic localization of underwater
vehicles,” IEEE J. Ocean. Eng., vol. 42, no. 3, pp. 603–618, Jul. 2017.
[4] M. Rahmati and D. Pompili, “UNISeC: Inspection, separation, and clas-
classification accuracy, where the numerical results are plotted sification of underwater acoustic noise point sources,” IEEE J. Ocean.
in Fig. 5(b). These networks are modified to accept the input of Eng., vol. 43, no. 3, pp. 777–791, Jul. 2018.
[5] G. Sharma, K. Umapathy, and S. Krishnan, “Trends in audio sig-
acoustic signals and trained from scratch with randomly initial- nal feature extraction methods,” Appl. Acoust., vol. 158, Jan. 2020,
ized parameters as UATC-DenseNet. Regarding the accuracy, Art. no. 107020.
in the impaired condition of low SNR (≤0 dB), CNN-ELM [6] X.-Y. Zeng and S.-G. Wang, “Bark-wavelet analysis and Hilbert–Huang
that is plainly configured with two convolutional layers and transform for underwater target recognition,” Def. Technol., vol. 9, no. 2,
pp. 115–120, Jun. 2013.
three fully connected layers reports the worst classification [7] W. Zhang, Y. Wu, D. Wang, Y. Wang, Y. Wang, and L. Zhang, “Under-
rate. Despite being facilitated for 1-D signal processing, water target feature extraction and classification based on gammatone
SqueezeNet can show a pleasurable accuracy with fire modules filter and machine learning,” in Proc. Int. Conf. Wavelet Anal. Pattern
cascaded in the architecture. With multiple residual blocks, Recognit. (ICWAPR), Chengdu, China, Jul. 2018, pp. 42–47.
[8] X. Ke, F. Yuan, and E. Cheng, “Underwater acoustic target recognition
wherein the information identity is maintained throughout based on supervised feature-separation algorithm,” Sensors, vol. 18,
the network architecture, ResNet performs target classification no. 12, p. 4318, Dec. 2018.
more accurately than SqueezeNet. By competently studying [9] M. E. Paoletti, J. M. Haut, J. Plaza, and A. Plaza, “Deep learning
classifiers for hyperspectral imaging: A review,” ISPRS J. Photogramm.
dense skip-connections, UATC-DenseNet optimizes the uti- Remote Sens., vol. 158, pp. 279–317, Dec. 2019.
lization of representational features accumulated in multiple [10] X. Wang, A. Liu, Y. Zhang, and F. Xue, “Underwater acoustic target
layers to achieve a superior accuracy, which is greater than the recognition: A combination of multi-dimensional fusion features and
others by approximately 0.9%–4.6% at 0-dB SNR. Regarding modified deep neural network,” Remote Sens., vol. 11, no. 16, p. 1888,
Aug. 2019.
the computational complexity as one of the most important [11] F. Yuan, X. Ke, and E. Cheng, “Joint representation and recognition
factors in real system implementation, we report the average for ship-radiated noise based on multimodal deep learning,” J. Mar. Sci.
prediction time (measured by the stopwatch timer in MAT- Eng., vol. 7, no. 11, p. 380, Oct. 2019.
[12] X. Cao, X. Zhang, R. Togneri, and Y. Yu, “Underwater target classifi-
LAB) with the standard deviation, the number of learnable cation at greater depths using deep neural network with joint multiple-
parameters, and the number of GMACs of UATC-DenseNet domain feature,” IET Radar, Sonar Navigat., vol. 13, no. 3, pp. 484–491,
and other networks in Table II. This experiment is done on Mar. 2019.
a hardware platform using Core i5-9400 CPU, 16-GB RAM, [13] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger,
“Densely connected convolutional networks,” in Proc. IEEE Conf.
and a single GeForce RTX 2070 GPU. With 124k parameters Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, Jul. 2017,
and 0.03 GMACs, UATC-DenseNet predicts acoustic targets pp. 2261–2269.
more slowly than SqueezeNet by 0.2 ms due to the multi-input [14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
concatenation-based structure, but the speed is more stable image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 770–778.
with a smaller standard deviation. ResNet and CNN-ELM, [15] G. Hu, K. Wang, Y. Peng, M. Qiu, J. Shi, and L. Liu, “Deep learn-
which contain very large numbers of parameters in the model, ing methods for underwater target feature extraction and recognition,”
report the worst processing speed with the prediction time of Comput. Intell. Neurosci., vol. 2018, pp. 1–10, Mar. 2018.
[16] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and
3.9 and 4.7 ms, respectively, due to the addition operation K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer para-
performed in the residual block and the extremely expensive meters and < 0.5MB model size,” 2016, arXiv:1602.07360. [Online].
computation of three fully connected layers. Available: http://arxiv.org/abs/1602.07360

Underwater Acoustic Target Classification Based On Dense Convolutional Neural Network

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Underwater Acoustic Target Classification Based On Dense Convolutional Neural Network

Uploaded by

Copyright:

Available Formats

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 1

Underwater Acoustic Target Classification Based

2 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS

DOAN et al.: UA TARGET CLASSIFICATION BASED ON DENSE CNN 3

4 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS

Fig. 4. Accuracy of 12-target classification in detail.

Add + Conv by around 79k learnable parameters (including

DOAN et al.: UA TARGET CLASSIFICATION BASED ON DENSE CNN 5

You might also like