Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Applied Intelligence (2021) 51:980–990

https://doi.org/10.1007/s10489-020-01845-7

Hybrid nonlinear convolution filters for image recognition


Xiuling Zhang1 · Kailun Wei1 · Xuenan Kang1 · Jinxiang Li1

Published online: 11 September 2020


© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract
Typical convolutional filter only extract features linearly. Although nonlinearities are introduced into the feature extraction
layer by using activation functions and pooling operations, they can only provide point-wise nonlinearity. In this paper,
a Gaussian convolution for extracting nonlinear features is proposed, and a hybrid nonlinear convolution filter consisting
of baseline convolution, Gaussian convolution and other nonlinear convolutions is designed. It can efficiently achieve the
fusion of linear features and nonlinear features while preserving the advantages of traditional linear convolution filter in
feature extraction. Extensive experiments on the benchmark datasets MNIST, CIFAR10, and CIFAR100 show that the hybrid
nonlinear convolutional neural network has faster convergence and higher image recognition accuracy than the traditional
baseline convolutional neural network.

Keywords Gaussian convolution · Point-wise nonlinearity · Nonlinear features · Image recognition

1 Introduction years, neuroscience research has shown that most cells


in the striatum cortex(part of the visual cortex involved
In the field of computer vision and pattern recognition [1], in processing visual information) can be divided into
feature extraction is an important data processing method, simple, complex and ultra-complex with specific response
and the quality of the extracted features directly affects characteristics [10]. However, typical convolutional layer
whether subsequent work can achieve better performance. cannot express complex response properties in the striatum
Convolutional neural network(CNN) is currently the most cortex due to it is a linear system for affine transformation
widely used feature extraction method [2]. Because CNN [11]. Although nonlinearities are introduced into the feature
has the powerful function of representation learning and extraction layer by using activation functions and pooling
translation-invariant classification of input information, it operations, they can only provide point-wise nonlinearity
has achieved great success in the field of computer vision [12]. Existing research shows that high-order nonlinear
and pattern recognition, including object recognition [3], feature maps are able to make linear classifiers have
object detection [4], image enhancement [5], object tracking better discriminative power [13, 14]. Georgios proposed
[6], semantic segmentation [7], etc. Since AlexNet [8] Volterra-based convolution [15] to extend the convolution
won the championship with absolute advantage in the technique to a nonlinear form, however, the number of
2012 ImageNet competition, research on deep learning has parameters of the nonlinear term increases exponentially as
sprung up. the order of the polynomial increases, greatly increasing the
In recent years, various effective network structures were computational complexity. Therefore, research in the field
proposed and these models perform well in both computer of nonlinear convolution still has great challenges.
vision and pattern recognition. Although convolution has The popular approach to improving the performance
an overwhelming advantage in extracting the characteristics of CNN is to develop new and more efficient network
of image data, it still has limitations [9]. In recent structures and to train them using large-scale training data
[16], however, this approach has encountered a bottleneck.
 Xiuling Zhang Therefore, It is worth focusing on that improve the perfor-
zxlysu@ysu.edu.cn mance of the convolution network by changing the working
mode of the convolution filter. In this paper, inspired by
1 Key Laboratory of Industrial Computer Control
Support Vector Machine(SVM) [17], the Gaussian kernel
Engineering of Hebei Province, Yanshan University, function is introduced into the convolution operation, and
Qinhuangdao 066004, China
X. Zhang et al. 981

a Gaussian convolution filter for feature extraction is pro- repeatedly stacking 3*3 convolution filters and 2*2 maxi-
posed. In order to approximate the complex behavior of mum pooling layers. Chollet [23] proposed inception block,
the human visual system [18], a hybrid nonlinear convolu- It can make more efficient use of computing resources and
tion(HN convolution) filter composed of traditional linear can extract more features under the same amount of calcu-
convolution, Gaussian convolution and other nonlinear con- lation. He et al. [24] proposed ResNets, which have shortcut
volutions is designed. Since the proposed method introduces connections parallel to their normal convolutional layers,
new hyperparameters, a random search algorithm to explore can effectively solve the hard optimization of deep learning
the optimal hyperparameters is designed. when increasing the models parameters. Huang [25] takes
Typical convolutional filter only extract features linearly. the input of each layer of the network as the output of all
Compared with the existing methods, HN convolution previous layers, thus proposed DenseNet. Hu [26] proposed
filter at the cost of adding a small number of parameters, SE block, which can explicitly model the interdependence
introduce nonlinearity into the convolution operation, and between feature channels. Inspired by the circular structure
improve the capacity and expression ability of the model, and attention mechanism. Yang [27] proposed CliqueNet,
which is beneficial to the enhancement of the generalization that is, the feature map output by the convolution can be
ability of the model. reused, and the refined feature map will pay attention to
In this paper, we aim to design a convolution filter that more important information.
can effectively extract image features. The novelty and In addition to the methods described above, researchers
contributions of the proposed are summarized as follows: made improvements in the way convolution filters work.
Yang [28] proposed CondConv, which can learn a specific
(1) A Gaussian convolution for extracting nonlinear features convolution kernel parameter for each sample, so that the
is proposed, and it can be trained using gradient network has achieved a balance between accuracy improve-
optimization algorithms, i.e., SGD [19], Adam [20]. ment and time-consuming inference Ding [29] proposed an
(2) A hybrid nonlinear convolution convolution(HN con- asymmetric convolution block (ACB) as the building block
volution) filter consisting of baseline convolution, of CNN, which uses a one-dimensional asymmetric convo-
Gaussian convolution and other nonlinear convolu- lution kernel to enhance the square convolution filter. Lavin
tions is designed. Incorporated a biologically plausible [30] converts convolution operations to matrix multiplica-
and efficient nonlinear convolution scheme into the tions, improving the speed of convolution operations. In
functionality of CNNs. order to reduce the number of model parameters, Group
(3) A random search algorithm for exploring the optimal convolution [31] and Depthwise separable convolution [32]
hyperparameters is designed, and its effectiveness is are proposed. However, Little effort has been devoted to
verified on the benchmark datasets. explore new computational models that extend the convolu-
tion technique to non-linear forms, taking advantage of the
This paper is organized as follows. Section 2 outlines research results of neuroscience.
the related works. The methodology adopted in this paper Typical convolution refers to the inner product operation
is described in Section 3. The Section 4 is the analysis of convolution filter (a filter matrix) and the data from
and experimental results. The validity of the algorithms different data windows in the image. The essence is
proposed is verified through the experiments on the to extract the features of different frequency bands of
Minst, CIFAR10 and CIFAR100 benchmark datasets. The the image. Convolution filter is the most basic unit for
Section 5 is the conclusion and the forecast part. convolution operation. It is a set of neurons with fixed
weights, it is usually a c × n × m three-dimensional matrix
(can also be regarded as c n × m two-dimensional matrices),
2 Related work c represents the number of channels of the convolution filter,
which is the same as the number of channels of the input
Hubel and Wiesel [21] found that their unique network features, n and m are also the receptive fields of the neurons.
structure can effectively reduce the complexity of feed- Stored in each n×m matrix is a coefficient that processes the
back neural networks when studying the local sensitive and data in the receptive field. The operation of each convolution
directional selection of neurons in the cats cerebral cortex. filter is used to extract specific features (such as extracting
Inspired by this, the convolutional neural network(CNN) object outlines, color shades, etc.). Multiple convolution
structure is proposed. With the rapid development of com- filters overlay become convolutional layers, the process of
puter hardware and the improvement of computer vision extracting new features from the original data through the
tasks, some complex and effective network structures convolutional layer is also called feature map.
have been proposed. Such as VGGNet [22], which is a A convolution filter uses a set of weights to extract
16 19-deep convolutional neural network constructed by features in the same way when “scanning” each part of the
982 Hybrid nonlinear convolution filters for image recognition

data, resulting in a feature. When the data window slides, 3 Hybrid nonlinear convolution filter
the input is changing, but the weights of the convolution
filter is fixed. This weights is unchanged, that is, the weight 3.1 Gaussian convolution
sharing mechanism in the CNN. Because the existence of
the weight sharing mechanism greatly reduces the number The existing nonlinear convolution is introduced in section2,
of parameters of the convolutional layer. but it is not widely used in image recognition tasks because
For a convolution filter with a n × m receptive field, each of its great limitations. It can be known from equation
time the data window slides, the data collected is a patch (4) that the number of parameters of the Volterra convolu-
I ∈ R n×m with h elements(h = n×m), reshaped as a vector tion increases exponentially with increasing order. Although
X ∈ Rh: second-order Volterra convolutions are commonly used,
X = [x1 , x2 , x3 , . . . , xh ]T (1) the number of parameters and computational complexity in
deep networks is much higher than the baseline convolu-
The input-output function of a typical convolutional filter is: tion. Experiments have shown that Lp -Norm convolution

h performs well in image recognition tasks with noise inter-
y(x) = (wi xi ) + b (2) ference, but its accuracy is slightly lower than the baseline
i=1 convolution in conventional image recognition tasks.
Where wi are the weights in the convolution filter, the The Gaussian kernel function is mainly applied to
number of which is related to the size of the convolution image filtering and feature transformation of data, and
filter, and b is the bias. has a good effect. In this paper, the Gaussian kernel
Volterra convolution is a nonlinear convolution proposed function is applied to the feature extraction of images,
in CVPR 2017. It applies Volterra kernel theory to the pro- and Gaussian convolution is proposed. Comparison with
cessing of data in the receptive field and is a nonlinear oper- high-order Volterra convolutions, Gaussian convolution has
ation. Different from the traditional convolution, Volterra fewer parameters and lower computational complexity. The
convolution is able to obtain the relationship between dif- Gaussian kernel function contain the two-norm operation,
ferent elements in the receptive field. The input-output which means that the Gaussian convolution has the same
function of a second order Volterra convolutional filter is ability to identify images with noise interference as the Lp -
h  h
i,j

h Norm convolution, and Gaussian convolution don’t have
y(x) = (w2 xi xj ) + (w1i xi ) + b (3) hyperparameters like dp in Polynomial convolution. The
i=1 j =1 i=1 input-output function of Gaussian convolutional filter is:
Where w1i are the weights of the Volterra convolutional y(x) = exp(−γ  X − W 2 ) (8)
filter’s first-order terms, w2( i, j ) are the weights of the
Where γ is the bandwidth, W is center vector of Gaussian
second-order terms, and b is the bias. Similarly, it can be
kernel. Like the baseline convolution, Gaussian convolution
extended to higher order Volterra convolutional, the number
also has the weight sharing mechanism, and its model
of parameters of the r-order Volterra convolutional is:
parameters γ and W can be trained by the gradient descent
(n + r)! algorithm. The gradient operation formulas are as shown in
nv = (4)
n!r! (9) and (10).
Then the Lp -Norm convolutional and Polynomial ∂y
convolutional were proposed. The Lp -Norm convolutional = 2γ (X − W )y (9)
∂W
neural network has high accuracy in identifying images with
noise interference, the input-output function of L1 -Norm ∂y
convolutional and L2 -Norm convolutional filter are: = −y  X − W 2 (10)
∂γ
ym (x) = X − W 1 (5)
3.2 Structure of the hybrid nonlinear convolution
filter
ye (x) = X − W 2 (6)
The input-output function of Polynomial convolutional Each higher animal visual neuron is connected randomly
filter is with 102 to 104 other neurons, thus forming a visual neural
dp
 d −j network [33]. Therefore, the biological neural network has a
y(x) = (X W + cp )
T dp
= cpp (XT W )j (7) high degree of complexity and randomness [34]. Biological
j =0 research shows that higher animal visual cells can be
Where dp extends the feature space to dp dimensions; cp divided into three types according to their complexity:
is able to balance the nonlinear orders. simple, complex and super-complex, and have specific
X. Zhang et al. 983

feature extraction, and ji =2 means that the i-th channel


in the convolutional layer uses Gaussian convolution.
The algorithm can also be extended to initialize the HN
convolution consisting of multiple convolution types.
For a HN convolution filter consisting of n convolution
types, when extracting features from the input, the input is
first divided into n parts according to the structure matrix J ,
then the feature extraction is performed on the n parts using
the convolution of the specified type, and finally the results
of all the feature extractions are accumulated to obtain the
final feature map. The feature extraction process is shown
in Fig. 2, and the feature extraction algorithm is detailed in
Algorithm 2.
Fig. 1 Structure of HN convolution filter

response characteristics. Traditional baseline convolutions


can only perform linear operations on input information and
cannot simulate how complex visual cells work. Therefore,
the HN convolution filter consisting of baseline convolution,
Gaussian convolution and other nonlinear convolutions (if
necessary) is designed. Figure 1 shows a HN convolution
filter consisting of a number of different convolutions, each
color represents a convolution type, and the convolution
type of each channel in the convolution filter is determined
by the roulette algorithm.
The HN convolution filter works in the same way as
the traditional baseline convolution filter. However, the
HN convolution filter needs to be initialized once before
feature extraction, the purpose of which is to determine the
type of convolution of each channel within the convolution The HN convolution filter improves the model capacity, and can
layer. Taking a HN convolution filter consisting of Gaussian realize the fusion of linear features and nonlinear features,
convolution and baseline convolution as an example, the effectively increasing the feature diversity. In addition,
initialization algorithm is shown in Algorithm 1. the HN convolution filter completely follows the tradi-
tional CNN workflow, and it is not limited by the network
structure and can be inserted into any network structure.
As the convolution types increases, the initialization time
and computational complexity of the HN convolution are
greatly increased. Therefore, in the experiment, the HN
convolution used is composed of the Gaussian convolution
of the baseline convolution .

3.3 Hyperparameter optimization

Hyperparameters selection is generally divided into two


methods, manual selection and automatic selection. Manual
selection means that we use the understanding of the model
and the analysis of the results to adjust the parameters
and manually select the parameters. Automatic selection
Determine the convolution type of each channel in the is to design an automatic parameter adjustment tool, and
convolutional layer by using the structure matrix J . In then make multiple attempts according to the different
Algorithm 1, when ji =1, it means that the i-th channel in hyperparameters provided until you get a satisfactory result
the convolutional layer uses the baseline convolution for i.e., grid search [35], random search [36].
984 Hybrid nonlinear convolution filters for image recognition

Fig. 2 The workflow of HN


convolution filter

Random search is to use a distribution function to simu- 4 Experiments


late random numbers, and then use the parameters generated
by random numbers to train. In practice, the suitable param- To verify the effectiveness of the proposed method, we
eters are often in a small part of a complete distribution. tested on three benchmark datasets (MINST, CIFAR10,
Our grid search does not guarantee direct search to the and CIFAR100). The HN convolution filter used in the
appropriate hyperparameters, and random search greatly experiment are composed of Gaussian convolution and
increases the possibility of finding suitable parameters. baseline convolution, and the probability of Gaussian
Therefore, the hyperparameter α finds the optimal value convolution is 0.75, and the probability of baseline
through a random search strategy. The algorithm is shown convolution is 0.25(Section 4.2 explains how to select
in Algorithm 3. hyperparameters α). All the experiments are trained on
Intel i7-8700K CPU, NVIDIA GTX1080Ti GPU, using
the software framework of PyTorch platform. The neural
network models are trained by using stochastic gradient
descent(SGD), each model trained 250 epochs and used
dynamic learning rates during training, as shown in
Table 1.
In the following, the network model name with the HN
suffix indicates that the model uses HN convolution filter,
and the suffix-free representation uses a traditional baseline
convolution filter.

Table 1 Dynamic learning rate

Epoch Learning rate

[0, 120) 0.1


[120, 190) 0.01
[190, 250) 0.001
X. Zhang et al. 985

Table 2 The Performance of Gaussian convolution and Volterra Table 3 The verification accuracy of each model on different
convolution on CIFAR10 hyperparameters

Architecture Volterra convolution[%] Gaussian convolution[%] Architecture α1 =0.25,α2 =0.75 α1 =0.5,α2 =0.5 α1 =0.75,α2 =0.25

LeNet 66.71 67.30 LeNet 67.89 67.47 66.89


AlexNet 79.01 80.07 AlexNet 80.41 80.09 79.66
GoogLeNet 93.88 94.10 GoogLeNet 94.63 94.44 94.11
ResNet18 92.03 92.71 ResNet18 92.91 92.80 92.20
DenseNet22 91.66 92.04 DenseNet22 92.47 92.44 91.89
SEResNet18 92.20 92.65 SEResNet18 92.78 92.61 92.49
CliqueNet 94.12 94.40 CliqueNet 94.43 94.40 94.17

4.1 Gaussian convolution performance verification value of ŠÁ obtained using Algorithm 3, we selected three
different sets of hyperparameters to conduct experiments
In order to verify the performance of the Gaussian convolu- on seven network structures. It can be seen from Table 3
tion, we conducted comparative experiments between Gaus- that when α1 =0.25 and α2 =0.75, the recognition accuracy
sian convolution and the Volterra convolution. We select of the network is higher than that of the other two sets of
seven classic network structures, including LeNet, AlexNet, hyperparameters.
GoogLeNet, ResNet, DenseNet, SEResNet and CliqueNet,
as the baselines. The dataset used for testing is CIFAR10. 4.3 HN-CNN structural performance verification
The experimental results are shown in Table 2, from which
we can find that : Compared with the network model using The biological neural network has a high degree of
Volterra convolution, the recognition accuracy[%] of the complexity and randomness. The HN convolution proposed
network model using Gaussian convolution is improved by in this paper aims to simulate the working mode of
0.59, 1.06, 0.22, 0.68, 0.38, 0.45 and 0.28, respectively. higher animal visual neural networks. Therefore, it is more
suitable to use the roulette algorithm to construct HN
4.2 Hyperparameter optimization experiment convolution according to the probability of different types
of convolution.
In this article, we use LeNet to explore the optimal value We also constructed a HN convolution with regular
of αi on the Minst dataset, the number of tests is k=50, structure (referred to as RS-HN convolution), however,
and the number of convolution types is n=2. The search the channel where each convolution is located is fixed.
results are shown in Fig. 3. It is found through experiments For instance, for the RS-HN convolution composed of
that the test accuracy is highest when α1 =0.25 and α2 =0.75. Gaussian convolution and baseline convolution, and the
In order to further verify the effectiveness of the optimal baseline convolution probability is α1 =0.25, the Gaussian
convolution probability is α2 =0.75, then the first 25% of
its channel It is the baseline convolution, and the rest is
Gaussian convolution, as shown in Fig. 4. In order to verify
the effectiveness of the method proposed in this paper,
we conducted extensive experiments on seven network
structures using HN convolutions and RS-HN convolutions.

Fig. 3 The random search results of HN convolution Fig. 4 Comparison between RS-HN convolution and HN convolution
986 Hybrid nonlinear convolution filters for image recognition

Table 4 HN convolution VS. RS-HN convolution

Architecture α1 =0.25,α2 =0.75 α1 =0.5,α2 =0.5 α1 =0.75,α2 =0.25

RS-HN-CNN HN-CNN RS-HN-CNN HN-CNN RS-HN-CNN HN-CNN

LeNet 67.11 67.89 67.31 67.47 66.61 66.89


AlexNet 79.98 80.41 79.51 80.09 79.45 79.66
GoogLeNet 94.60 94.63 94.10 94.44 94.00 94.11
ResNet18 92.79 92.91 92.73 92.60 91.86 92.00
DenseNet22 92.46 92.47 92.08 92.44 91.89 91.85
SEResNet18 92.40 92.79 92.33 92.61 92.25 92.49
CliqueNet 94.32 94.43 94.11 94.40 94.09 94.22

The dataset used for the experiment is CIFAR10, and the 4.5 Recognition results of different models
experimental results are as follows Table 4 are shown.
It can be seen that in most cases, the performance of In this section, the performance of HN convolution filter
HN convolution is better than RS-HN convolution, so it is is tested on the CIFAR10 and CIFAR100 datasets using
reasonable to use the roulette algorithm to construct HN a variety of convolutional networks of different structures.
convolution according to the probability of different types The networks used for testing are LeNet, AlexNet,
of convolution. VGG16, GoogLeNet, ResNet18, ResNet34, DenseNet22,
DenseNet28, SEResNet18, SEResNet34 and CliqueNet.
4.4 Test of convergence effect With the exception of VGG16 and GoogLeNet, the
convolutional layers in the rest of the network are all
In order to verify the convergence fastness of the proposed replaced with HN convolutional. In VGG16, only the first
method, we replace the convolutional layer in LeNet-5 six layers of convolutional layers are replaced with hybrid
with HN convolution and Gaussian convolution, and use nonlinear convolutional replacements. Replace the 3×3
the MINST dataset for training and testing. It can be seen and 5×5 convolutional layers with HN convolutional layer
from Fig. 5 that the convergence speed of HN convolu- replacements in GoogLeNet, and the 1×1 convolutional
tion and Gaussian convolution is higher than the baseline layer for dimensionality reduction still uses baseline
convolution, and the HN convolution has the best conver- convolution. Table 5 shows the number of model parameters
gence effect. By comparing Fig. 5a and b, it can be seen of the above networks. Since the convolution type of each
that the computational complexity of HN convolution and channel in the HN convolution is determined by probability,
Gaussian convolution is higher than the baseline convolu- the number of parameters is not a fixed value but is within
tion, but their good convergence effect can make up for this a range. It can be seen from Table 5 that the use of HN
defect. convolution adds a small amount of model parameters.
Due to the small sample size and low complexity of the When testing the performance of the HN convolution
MINST dataset, many existing networks have very high on CIFAR10, the test results are extracted every two
recognition accuracy (close to 100%) in the MINST dataset. epochs. Figure 6 shows the classification accuracy of each
Therefore, in this section, the MINST dataset is not used to model at different training periods. And Table 6 shows
perform a recognition accuracy test on the HN convolution. the highest classification accuracy for each model. As can

Fig. 5 Influence of HN
convolution and Gaussian
convolution on convergence rate

(a) (b)
X. Zhang et al. 987

Table 5 The number of model parameters Table 6 The verification accuracy of each model on CIFAR10

Architecture CNN HN-CNN Architecture CNN[%] HN-CNN[%]

VGG16 34.0M 34.0∼37.7M LeNet 65.41 67.89


GoogLeNet 6.2M 6.2∼6.8M AlexNet 78.33 80.41
VGG16 91.03 91.95
ResNet18 11.2M 11.2∼12.5M
GoogLeNet 93.79 94.63
ResNet34 21.3M 21.3∼23.6M ResNet18 91.84 92.91
DenseNet22 0.3M 0.3∼0.4M ResNet34 92.10 93.16
DenseNet28 0.5M 0.5∼0.6M DenseNet22 91.57 92.47
SEResNet18 11.4M 11.4∼12.5M DenseNet28 92.33 93.19
SEResNet34 21.6M 21.6∼23.7M SEResNet18 92.21 92.78
CliqueNet 1.0M 1.0∼1.1M SEResNet34 92.46 93.19
CliqueNet 94.07 94.43

Fig. 6 The accuracy of each


model on the CIFAR10

(a) (b)

(c) (d)

(e) (f)
988 Hybrid nonlinear convolution filters for image recognition

Fig. 7 The accuracy of each


model on the CIFAR100

(a) (b)

(c)
be seen from Table 6, comparison with original model, be seen from Table 7, comparison with original model,
the classification accuracy[%] of LeNet-HN, AlexNet-HN, the TOP1 accuracy[%] of VGG16-HN, GoogLeNet-HN,
VGG16-HN, GoogLeNet-HN, ResNet18-HN, ResNet34- ResNet18-HN, ResNet34-HN, DenseNet22-HN, DenseNet28-
HN, DenseNet22-HN, DenseNet28-HN, SEResNet18-HN, HN, SEResNet18-HN, SEResNet34-HN, CliqueNet-HN
SEResNet34-HN, CliqueNet-HN increased by 2.48, 2.08, increased by 0.82, 1.96, 1.33, 1.91, 2.9, 2.01, 0.67, 0.35 and
0.82, 0.84, 1.07, 1.06, 0.9, 0.86, 0.57, 0.73 and 0.36, 0.31 respectively; the TOP5 accuracy[%] increased by 1.17,
respectively. 1.16, 1.85, 1.97, 1.96, 2.12, 0.86, 0.75 and 0.92 respectively.
To further test the performance of the HN convolution In addition, through comparison of experimental results, it
filter, we experimented with various models on the can be found that the recognition accuracy of DenseNet-
CIFAR100 dataset consisting of one hundred samples. The HN and ResNet-HN with shallow layers is higher than that
experimental results are shown in Fig. 7. Table 7 show the of DenseNet and ResNet with deep layers, which indicates
TOP1 accuracy and TOP5 accuracy of the models. As can that HN convolution also contribute to achieving network
lightweight.
By using multiple networks to conduct experiments in
Table 7 The TOP1 and TOP5 accuracy of each model on CIFAR100
different datasets, it can be concluded that HN convolution
Architecture TOP1 TOP5 filter have better performance in solving image recognition
problems than traditional baseline convolution filter.
CNN[%] HN-CNN[%] CNN[%] HN-CNN[%]

VGG16 71.23 72.05 89.62 90.79


GoogLeNet 75.82 77.78 92.48 93.64 5 Conclusion
ResNet18 73.28 74.61 91.02 92.87
ResNet34 74.01 75.92 91.34 93.31
In this paper, we applied the Gaussian kernel function
DenseNet22 68.20 71.10 89.44 91.40
to the feature extraction of the image, and designed a
DenseNet28 70.11 72.12 90.07 92.19
Gaussian convolution to extract nonlinear features. And the
HN convolution filter consisting of baseline convolution,
SEResNet18 75.65 76.32 92.61 93.47
Gaussian convolution and other nonlinear convolutions
SEResNet34 76.46 76.81 93.01 93.76
(if necessary) is proposed, which can extract the linear
CliqueNet 72.68 72.99 91.91 92.83
and nonlinear features of the input information at the
X. Zhang et al. 989

same time and achieve effective feature fusion, improves 10. Hubel D, Wiesel T (1968) Receptive fields and functional
the feature diversity. In addition, for the HN convolution architecture of monkey striate cortex. J Physiol 195(1):215–243
11. Rapela J, Mendel J, Grzywacz N (2006) Estimating nonlinear
proposed in this paper, a random search algorithm is
receptive fields from natural images. J Vis, 441–474
designed to explore the optimal hyperparameters. Through 12. Lin T-Y, Chowdhury AR, Maji S (2015) Bilinear CNN models
the extensive experiments on the three benchmark datasets for fine-grained visual recognition. In: Proceedings of the IEEE
of MNIST, CIFAR10 and CIFAR100, the validity of the international conference on computer vision,pp 1449–1457
13. Niell C, Stryker M (2008) Highly selective receptive fields in
HN convolutional neural network is verified. It increases
mouse visual cortex. J Neurosci 28(30):7520–7536
the convergence speed and recognition accuracy of the 14. Cui Y, Zhou F, Wang J, Liu X, Lin Y, Belongie S (2017) Kernel
model at the cost of increasing a small number of model pooling for convolutional neural networks. In: IEEE conference
parameters and computational complexity. We believe that on computer vision and pattern RECOGNITION (CVPR),
pp 3049–3058
the proposed method can contribute to the development of
15. Zoumpourlis Georgios, Doumanoglou Alexandros, Vretos
image recognition technology. Nicholas, Daras Petros (2017) Non-linear convolution filters
In future work, we will further optimize the feature for CNN-based learning. IEEE International Conference on
extraction and feature fusion process of HN-CNN to Computer Vision, pp 4771–4779
16. Yuan C, Wu Y, Qin X (2019) An effective image classification
improve the running speed. In addition, the performance
method for shallow densely connected convolution networks
of HN-CNN on some network structures is not very through squeezing and splitting techniques. Appl Intell, 3570–
satisfactory, which also needs to be improved in the future. 3586
17. Guyon I, Weston J, Barnhill S (2002) Gene selection for
cancer classification using support vector machines. Mach. Learn,
Acknowledgements This work is supported by the Natural Science pp 389–422
Foundation of Hebei Province (E2015203354), Science and Tech- 18. Szulborski R, Palmer L (1990) The two-dimensional spatial
nology Research Key Project of High School of Hebei Province structure of nonlinear subunits in the receptive fields of complex
(ZD2016100) and Basic Research Special Breeding Project Supported cells. Vis Res 30(2):249–254
by Yanshan University (16LGY015). We also thank MINST and 19. Qian N. (1999) On the momentum term in gradient descent
CIFAR for their open-source datasets. learning algorithms. Neural Networks : The Official Journal of the
International Neural Network Society 12(1):145–151
20. Kingma D, Ba J (2015) Adam: a method for stochastic opti-
mization. International conference on learning representations,
References pp 1–13
21. Hubel D, Wiesel T (1998) Early exploration of the visual cortex.
1. LeCun Y, Boser B, Denker J, Henderson D (1989) Backpropaga- Neuron 20(3):401–412
tion applied to handwritten zip code recognition. Neural Comput, 22. Simonyan K, Zisserman A (2014) Very deep convolutional net-
pp 541–551 works for large-scale image recognition, pp 1–14 arXiv:1409.1556
2. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradientbased 23. Chollet F (2017) Xception: Deep learning with depthwise
learning applied to document recognition. Proc IEEE, pp 2278–2324 separable convolutions, pp 1–8, arXiv:1610.02357
3. DiCarlo J, Zoccolan D, Rust N (2012) How does the brain solve 24. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep
visual object recognition. Neuron, pp 415–434 residual networks. In: European conference on computer vision,
4. Ren Shaoqing, He Kaiming, Girshick Ross, Sun Jian (2017) Springer, pp 630–645
Faster R-CNN: Towards real-time object detection with region 25. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017)
proposal networks. IEEE Trans Pattern Anal Mach Intell, Densely connected convolutional networks. In: IEEE conference
pp 1137–1149 on computer vision and pattern recognition, pp 2261–2269
5. Dong C, Loy C, He K, Tang X (2014) Learning a deep 26. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks.
convolutional network for image super-resolution. In: Proc Eur In: IEEE conference on computer vision and pattern recognition,
Conf Comput Vis, pp 184–199 pp 7132–7141
6. Wang N, Yeung D (2013) Learning a deep compact image 27. Yang Y, Zhong Z, Shen T, Lin Z (2018) Convolutional
representation for visual tracking. In: Proc Adv Neural Inf Process neural networks with alternately updated clique, pp 1–10,
Syst, pp 809–817 arXiv:1802.10419
7. Long J, Shelhamer E, Darrell T (2015) Fully convolutional 28. Yang B, Bender G, Ngiam J (2019) CondConv: Conditionally
networks for semantic segmentation. In: IEEE conference on parameterized convolutions for efficient inference, pp 1–12,
computer vision and pattern recognition, pp 3431–3440 arXiv:1904.04971
8. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet 29. Ding X, Guo Y, Ding G, Han J (2019) ACNet: Strengthening the
classification with Deep Convolutional Neural Networks. kernel skeletons for powerful CNN via asymmetric convolution
In: Conference on Neural Information Processing Systems, blocks, pp 1–10, arXiv:1908.03930
pp 1097–1105 30. Lavin A, Gray S (2015) Fast algorithms for convolutional neural
9. Russakovsky O, Deng J, Su H, Krause J (2015) Imagenet large networks, pp 1–9, arXiv:1509.09308
scale visual recognition challenge. Int J Comput Vis 115(3):211– 31. Zhang T, Qi G, Xiao B, Wang J (2017) Interleaved group convo-
252 lutions for deep neural networks, pp 1–11, arXiv:1707.02725
990 Hybrid nonlinear convolution filters for image recognition

32. Howard AG, Zhu M, Chen B (2017) Mobilenets: Efficient Xuenan Kang received the
convolutional neural networks for mobile vision applications, B.E. degree in Electrical
pp 1–9, arXiv:1704.04861 Engineering and Automa-
33. Hui J, Shi L, Zhang P, He S (2016) The neural mechanism of tion from the University of
visual consciousness in human brain. Progress Biochem Biophys Jiamusi, China. His current
43(04):297–307 research interests include rein-
34. Li Q, Geng H (2008) Research progress of cognitive neuroscience forcement learning and path
of visual awareness. Progress in Natural Science (11):1211–1219 planning.
35. Wen B, Dong W, Xie W (2018) Parameter optimization method
for random forest based on improved grid search algorithm.
Computer Engineering and Applications 50(10):154–157
36. Bergstra J, Bengio Y (2012) Random search for hyper-parameter
optimization. J Mach Learn Res 13:281–305

Publisher’s note Springer Nature remains neutral with regard to Jinxiang Li received the B.E.
jurisdictional claims in published maps and institutional affiliations. degree in automatic from
North China University of
Water Resources and Electric.
Xiuling Zhang received her His current research interests
B.E. degree from the Northeast include machine learning and
Heavy Machinery Institute in pattern recognition.
1990, an M.E. degree from the
Northeast Heavy Machinery
Institute in 1995. She obtained
her Ph.D. degree from Yan-
shan University, China, in
2002. At present, she is a
professor of the College of
Electrical Engineering, Yan-
shan University, China. Her
research interests include the
modeling, control and pattern
recognition of complex system
based on artificial intelligence.

Kailun Wei received the B.E.


degree in automatic from
Hebei University of Science
and Technology. His current
research interests include
machine learning and pattern
recognition.

You might also like