2018 - Song - A Sparsity-Based Stochastic Pooling Mechanism For Deep Convolutional Neural Networks

Neural Networks 105 (2018) 340–345
Contents lists available at ScienceDirect
Neural Networks
journal homepage: www.elsevier.com/locate/neunet
A sparsity-based stochastic pooling mechanism for deep

convolutional neural networks
Zhenhua Song a , Yan Liu a , Rong Song a , Zhenguang Chen b , Jianyong Yang b ,
Chao Zhang a, *, Qing Jiang a
a
School of Engineering, Sun Yat-sen University, Guangzhou 510006, PR China
b
The First Affiliated Hospital of Sun Yat-sen University, Sun Yat-sen University, Guangzhou 510080, PR China
article info a b s t r a c t
Article history: A novel sparsity-based stochastic pooling which integrates the advantages of max-pooling, average-
Received 21 March 2017 pooling and stochastic pooling is introduced. The proposed pooling is designed to balance the advantages
Received in revised form 2 May 2018 and disadvantages of max-pooling and average-pooling by using the degree of sparsity of activations
Accepted 23 May 2018
and a control function to obtain an optimized representative feature value ranging from average value
Available online 15 June 2018
to maximum value of a pooling region. The optimized representative feature value is employed for
probability weights assignment of activations in normal distribution. The proposed pooling also adopts
Keywords:
Deep learning weighted random sampling with a reservoir for the sampling process to preserve the advantages of
Pooling mechanism stochastic pooling. This proposed pooling is evaluated on several standard datasets in deep learning
Degree of sparsity framework to compare with various classic pooling methods. Experimental results show that it has good
Representative feature value performance on improving recognition accuracy. The influence of changes to the feature parameter on
Recognition accuracy recognition accuracy is also investigated.
© 2018 Elsevier Ltd. All rights reserved.
1. Introduction Several popular pooling methods, such as max-pooling,

average-pooling and stochastic pooling have been employed in
The field of deep learning has advanced rapidly due to the practice. Each of these pooling methods has advantages and
ability to perform deep structure optimization (Hinton & Salakhut- disadvantages due to the existence of inevitable quantization er-
dinov, 2006) on computational models designed to perform a wide ror in the pooling process. It is usually considered that average-
range of inference tasks. Convolutional neural networks (CNNs) pooling could reduce the variance increases, retaining background
are a type of multilayer-structured learning algorithm; they have information (LeCun et al., 1990); while the max-pooling could
gained wide attention from researchers, in improving deep net- reduce the deviation of the estimated average value caused by
work performance via reducing the number of parameters (weight convolution layer parameter error while preserving the foreground
sharing) using relative spatial relationships (LeCun et al., 1989; texture details (Boureau, Bach, LeCun, & Ponce, 2010). These
LeCun, Bottou, Bengio, & Haffner, 1998) at the earlier levels of pooling methods adopt down-sampling operations, which reduce
processing. Pooling processes can be employed to classify perform the feature dimensionality for the following computations (Sun,
dimensionality reduction, discouraging large CNNs models from Song, Jiang, Pan, & Pang, 2017). Meanwhile, Yang, Yu, Gong, and
over-fitting (Srivastava, Hinton, Krizhevsky, Sutskever, & Salakhut- Huang (2009) pointed out that the maximum value is better than
dinov, 2014). Being designed to maintain the properties of rotation, the average value of a feature for classification to represent its
translation, scale invariance of features, pooling has been exten- activity over a region of interest. Boureau, Ponce, and LeCun (2010)
sively harnessed in deep convolutional networks. Pooling has been compared the differences and features between average-pooling
shown to achieve better robustness to noise and clutter with high and max-pooling by theoretical analysis, and summarized that
compactness for accurate recognition of mass images in automatic max-pooling and average-pooling have the potential drawbacks of
geometry molding of FEM and 3D printing (Xie, Tian, Wang, & losing background information and foreground texture informa-
Zhang, 2014). tion, respectively. Indeed, both average pooling and max-pooling
have advantages and disadvantages in practice on performance,
which highly depends on the specific application cases or datasets
*
Correspondence to: Sun Yat-sen University, No.132 Waihuan East Rd. of Uni-
versity Town, Guangzhou 510006, China. (Wang, Gao, Liu, & Meng, 2017). Thus, a principle/standard is to
E-mail address: zhchao9@mail.sysu.edu.cn (C. Zhang). be established for automatically choosing the better one between
https://doi.org/10.1016/j.neunet.2018.05.015
0893-6080/© 2018 Elsevier Ltd. All rights reserved.
Z. Song et al. / Neural Networks 105 (2018) 340–345 341
these two pooling methods in the specific cases, which could random sampling (WRS) has been employed for this sampling
promote the generalization ability of pooling. operation to promote the performance of pooling by improving
For this aim, Yu, Wang, Chen, and Wei (2014) proposed a randomness of sampling. This proposed pooling was evaluated in
mixed pooling method that consisted in randomly choosing be- terms of recognition accuracy within several classic datasets and
tween max-pooling or average-pooling to generate the output. its experimental test error compared with other classic pooling
This mechanism is realized by adding together the maximum and methods. The influence of changes to feature parameters on recog-
average values which are multiplied by their own coefficients. One nition accuracy is also discussed.
of the coefficients is either 0 or 1 randomly, and another is equal
to the corresponding opposite value of the previous one (0 and 2. Pooling mechanism
1 are the opposite values). This mechanism is praiseworthy on
improving the overall performance of pooling results, except that 2.1. Optimized representative feature value
it fails to reflect the advantages of both pooling methods at the
same time because it can only adopt and reflect either max-pooling A feature value is always employed for the pooling region, such
or average-pooling in each pooling process. Lee, Gallagher, and Tu as maximum value or average value, to be a benchmark for weight
(2015) made an improvement on this mixed pooling by replacing assignment and probability distribution of activations in pooling
previous random coefficient with a real number ranging from 0 region. The weight of the maximum value of activations could
to 1, namely, the mixing proportion; consequently, the weights be defined as 1 in max-pooling; in contrast, the weight of each
of maximum and average values are assigned by this real num- activation is the same in average-pooling by this point of view.
ber. The features of both max-pooling and average-pooling could In rank-based stochastic pooling, no such feature value exists.
be reflected in each pooling process by this mixing proportion Activations are arranged in descending order and given probability
mechanism, although the randomness of the sampling process is weights by exponential ranking (Michalewicz, 1994). This method
sacrificed. Later on, stochastic pooling emerged to give the prob- could improve the performance of pooling by avoiding the mistake
ability weights of the elements in feature map according to their that of offering equal or highly imbalanced importance to each
numerical values, and to randomly take sample in accordance with region since image features are highly spatially non-stationary
the probability weights as well. Zeiler and Fergus (2013) proposed (Shi, Ye, & Wu, 2016). Meanwhile, the authors also mentioned that
a classical stochastic pooling method by randomly picking the an inevitable degeneration of rank-based stochastic pooling into
activations in pooling region on the basis of their activities. It has max-pooling and loss of background information would occur if
the advantages of being hyper-parameter free and the ability of the maximum activation is much greater than the sum of others
combining with other regularization approaches, such as dropout (probabilities of others are ignored by that of maximum activa-
and data augmentation. This stochastic pooling method presents tion).
smaller training and testing errors than those of max-pooling and To remedy above mentioned disadvantages and improve the
average-pooling. Meanwhile, it is also reported that the pooling pooling algorithm, an optimized representative feature value R
performance could also be improved using the method of Dropout is proposed to replace these common feature values (maximum
(Iosifidis, Tefas, & Pitas, 2015). But the performance of classic and average values) and seek a reasonable balance between max-
Dropout is highly depended on the experience of position selection pooling and average-pooling to highlight foreground texture de-
for random deleting, which makes it an experience-dependent tails while preserving enough background character information.
method and limits its generalization ability (Cao, Li, & Zhang, 2015; The feature value R is defined by Eq. (1) (and shown in Fig. 1).
Srivastava et al., 2014). Wu and Gu (2015) pointed out that the R − Av g
random sampling process of stochastic pooling for activation obeys = Fp (α) (1)
Max − Av g
multinomial distributions, which is same as that of max-pooling
where, Max and Avg are the maximum and average values of
dropout. But in the case of specific retaining probabilities, the
activations in pooling region, respectively. Fp (α) is the control
max-pooling dropout could perform better than stochastic pooling.
function for optimizing this feature value, as shown in Eq. (2).
It reveals that max-pooling dropout and stochastic pooling have ( )
their own advantages with respect to sampling. Therefore, if a 1
⎧
novel pooling mechanism is designed to integrate the advantages
⎪
⎪ 2p−1 α p , 0 ≤ α ≤
2
⎨
of max-pooling, average-pooling and stochastic pooling together, Fp (α) = ( ) (2)
1
it would be expected to not only improve the diversity of pool- ⎩1 − 2p−1 (1 − α)p , ≤α≤1 .
⎪
⎪
ing results by taking a balance between highlighting foreground 2
textures and preserving background information, but also promote Here, p is a positive integer as feature parameter for setting the
the performance of recognition accuracy. curved shape of function Fp (α), and α is the degree of sparsity of
In this research, a novel sparsity-based stochastic pooling has convolved features in a pooling region as many researches showed
been proposed to integrate advantages of max-pooling, average- that the performance of pooling methods are highly affected by the
pooling and stochastic pooling on taking a balance to highlight sparsity of the pooling region (Boureau, Ponce et al., 2010). For ex-
foreground and preserve background information at same time ample, taking the maximum value works better than average value
and improving randomness of sampling. The pooling mechanism in a sparse region. Thus, a representative feature value designed
was introduced by using an optimized representative feature based on the sparsity of activations in a pooling region is more
value, which could automatically select to perform the features of reasonable.
max-pooling or average pooling primarily in specific application There are three main advantages of using Eq. (2) to define R.
cases or databases for promoting the generalization ability of pool- First, if p = +∞ (its value is set to be 100 in real case, which is
ing since it has been defined by using the degree of sparsity and a a number large enough to meet the computing requirement), the
special control function to generate a value ranging from average value of R tends to be either maximum value or average value of
value to maximum value of a pooling region. And the probabil- activations in pooling region (Fig. 1). This pooling will degenerate
ity weights of activations are assigned according to the distance into max-pooling or average-pooling to contain the features and
between the feature value and value of each activation based on functions of these two classic pooling methods; if p = 1, the value
normal distribution, which could evaluate the contributions of all of R will be linearly distributed between maximum and average
activations in the feature pooling region. A method of weighted values, which simplifies it for high computational efficiency in
342 Z. Song et al. / Neural Networks 105 (2018) 340–345
To describe the singular of a matrix, the relationship between

head singular value λ1 and other singular values has been set by
Eq. (3). In the application of SVD on principal component analysis
(Lipovetsky, 2009) and image compression, the head singular value
λ1 concentrates more ‘‘energy’’ (original image information) and
there is little influence on recovering matrix by ignoring those
singular values which are much smaller than λ1 (Dan, 1996). There-
fore, comparing the dispersion
∑n degree between λ1 and the sum of
other singular values 2 λi can describe the degree of sparsity. And
the coefficient of variation (CV ) is the classic value for presenting
dispersion degree (Eq. (3)) (Ahmed, 1995)
σ
CV = (3)
µ
where, σ is the standard deviation of variables (Eq. (4)), and µ is
∑n value of singular value λ1 and the sum of other singular
the average
values 2 λi (Eq. (5)):
 ⎡
 ( n )2 ⎤
1 ⎣
 ∑
σ =√ (λ1 − µ) +
2
λi − µ ⎦ (4)
Fig. 1. Representative feature value S vs. The degree of sparsity α . (For interpreta- 2
2
tion of the references to color in this figure legend, the reader is referred to the web
version of this article.) λ1 + λ2 + · · · + λn
µ= . (5)
2
The α is defined as a function of the CV (Eq. (6)). Since 0 < CV <
some cases; if 1 < p < +∞, the value of R, which will be nonlin- 1, such definition could preserve the monotonicity of α with its
early distributed between maximum and average values, may be value ranging from 0 to 1 (0 < α < 1).
optimized by employing function Fp (α) to offering different value
tendencies. α = log2 (CV + 1). (6)
Therefore, by using this proposed optimized representative
According to the mechanism described above, the value of R
feature value R, the proposed pooling could automatically choose
could be obtained and the flow chart of the mechanism is shown
the tendency of the pooling effect to perform the features of
in Fig. 2(a).
average pooling or max-pooling primarily based on the sparsity
of activations in a pooling region. As the value of R ranges from
2.3. Weights assignment and weighted random sampling
average value to maximum value of the activations, the proposed
pooling has taken a balance to automatically represent both the
R has been designed to be a benchmark for probability weights
properties of average pooling and max-pooling with its own stan- assignment of activations in our sparsity-based stochastic pooling
dard. Specifically, when p = +∞, the value of R would degener- during random sampling process. All the activations in the pooling
ate into average or maximum value, which means the proposed region are compared with R by their values to measure the closing
pooling could automatically choose to represent the properties of degree between R and activations. For activations, the closer to R,
average pooling or max-pooling in the specific application cases or the greater probability weight would be assigned based on normal
databases. Meanwhile, by optimizing the value of p (1 < p < +∞), distribution function, as shown in Eq. (7) and Fig. 3.
it could prevent the pooling from degenerating into max-pooling
when maximum activation is much greater than the sum of others. 1
w (ai ) = √
This is because the distribution curves of R could be adjusted by p
[∑ )2 ]
n ∑n
2π j=1 aj /n
(
aj −
to nonlinearly distribute between maximum and average values, j=1
as shown in Fig. 1. ⎧ ⎫
⎨ (ai − R)2 ⎬
× exp − [∑ ( (7)
2.2. The degree of sparsity
]
)2 ⎭
n ∑n
j=1 aj /n
⎩ 2
j=1 aj −
The criterion for measuring the degree of sparsity of a convolved where, R has been set to be the mathematical expectation of the
feature pooling region could be described by using entropy theory normal distribution function. Then, each activation ai in convolved
(Li, Fan, & Liu, 2015). It shows that the smaller the entropy, the features pooling region has a corresponding probability weights
sparser the feature. Here we also employ entropy to describe the value w (ai ). In previous mentioned example of Fig. 2, R is calcu-
sparse degree. Meanwhile, the entropy of a pooling region (matrix) lated as 6.898, and the average and maximum value of activations
could be represented by the singular values, which are obtained by are 5 (same as a4 ) and 9 (same as a7 ), respectively. The assignment
singular value decomposition (SVD). For a matrix, the entropy of of probability weights of activations is shown in Fig. 3.
it would be smaller if the matrix is more singular (Gu, Xiong, & It should be noticed that although the covered area of normal
Li, 2015), which means that more information about the matrix distribution function is 1 (100%), in real case the sum of all proba-
is contained in fewer singular values. For instance, the singular bility weights is not 1 due to the discrete or repeated distribution
values λ1 , λ2 , . . . , λn (λ1 > λ2 > · · · > λn ) could be obtained and limited amount of activations. Thus, a method of weighted
from a non-negative matrix (activations ai are the elements of it, random sampling (WRS) with a reservoir (Efraimidis & Spirakis,
i = 1, 2, . . . , n), which is the convolved feature pooling region 2006) is employed in this study for sampling of activations without
in this study, by SVD. Thus, the relationship could be established weights normalization. WRS with a reservoir is considered to be a
between singular values and the degree of sparsity with the aid of method of sampling in a data stream, without the need to know the
the entropy. size of the stream in advance. Consequently, w (ai ) is only defined
Fig. 2. The flow chart of sparsity-based stochastic pooling.
3. Experiments
3.1. Experiments setup
The experiments of this proposed sparsity-based stochastic

pooling are conducted by implanting it into Caffe of deep learning
framework (Jia et al., 2014) at a computing environment of Lenovo
Workstation (12 cores and 16 GB of memory) with NVIDIA K4200
GPU. The training parameters of the networks have been listed in
Table 1 (Zhang, 2004). And the loss function J and update rule for
(l)
Wi,j are defined by the classic Equations (11–13) (Shi et al., 2016),
respectively.
∑∑
J =− y∗x (t) log(yx (t)) (11)
t x
(l) (l) (l)
Wij = Wij + ∆Wij (12)
Fig. 3. Probability weights assignment mechanism. ∂J
∆Wij( ) = momentum · ∆Wij( ) − decay · ε · Wij( ) − ε ·
l l l
(13)
∂ Wij(l)
Table 1
Training parameters for deep learning (Shi et al., 2016). where, y∗x (t) and yx (t) are the target and predicted values of the tth
(l)
SGD Mini-batch size Momentum Weight decay Learning rate training example at xth class, respectively. Wij is the weight of the
128 0.9 0.01 0.001 neuron in the lth layer.
This proposed sparsity-based stochastic pooling mechanism is
evaluated with four classic benchmark datasets: MNIST (LeCun et
as the probability weight of the item (activation) itself, not the al., 1998), CIFAR-10 (Krizhevsky, 2012), CIFAR-100 (Krizhevsky,
probability weight of the item in the stream (whole samples space). 2012), and SVHN (Netzer, Wang, Coates, Bissacco, Wu, & Ng, 2011).
Therefore, weights normalization is not required in this process. And 3 × 3 pooling with stride 2 has been used in the networks,
The sampling process is shown in Fig. 2(b) and the pooled output which have been set as 6 convolutional layers with 5 × 5 filters
is defined as follows: and 64 feature maps.
Firstly, a key of each activation has been calculated according to
Eq. (8) (Efraimidis & Spirakis, 2006). 3.2. Results and discussion
1
( )
w ai The recognition accuracies between this proposed pooling
k (ai ) = ui (8) method and other classic pooling methods, such as max-pooling,
where, average-pooling, stochastic pooling and Tree+Max−Avg (Lee et
al., 2015), have been compared based on these datasets. The test
ui = random (0, 1) . (9) errors of training results are shown in Table 2.
It could be concluded that the proposed pooling method ex-
Then, comparing all k (ai ) one by one, and seeking the maximum hibits better performances on improving recognition accuracy as
key value and set its corresponding activation as pooled output S the test errors of proposed pooling method are smaller than most
(defined as Eq. (10)). of others with different datasets at same training parameters and
computing environment. The reason for the overall promoted
S = ai∼max(ki ) . (10)
recognition performance derives from the ability of the pooling
Some example calculations of keys and pooled output are method to find a balance between max-pooling and average-
shown in Fig. 2. pooling, and reserves the advantage of stochastic pooling as well,
344 Z. Song et al. / Neural Networks 105 (2018) 340–345
Table 2 task-related feature information is preserved while removing irrel-

Test error of various pooling methods with different datasets. evant features in practice. Medical image recognition and extrac-
Methods Datasets tion is one of the key targets for deploying deep learning, since it
MNIST CIFAR-10 CIFAR-100 SVHN has extensive application prospects.
Average-pooling 0.56% 11.26% 45.23% 2.43% We take, as an example, the analysis of CT images. The sternum
Max-pooling 0.48% 10.80% 42.81% 2.19% is sometimes recognized and extracted from CT images of chests to
Stochastic pooling 0.43% 9.82% 42.07% 2.18% build the corresponding 3D models for planning bone repair. We
Tree+Max−Avgb 0.31% 7.62% 32.37% 1.69% used a deep convolutional model trained by 10 cases/individuals
Proposed poolinga 0.41% 9.77% 40.75% 1.98% (550 ± 50 CT images with 512×512 pixels of each individual) in
a
p = 3 in datasets. order to perform segmentation. The parameters of Caffe frame for
b
Data obtained from the source of Lee et al. (2015). this example are same as Table 1, while other settings are same as
those in Section 3.1. One can see that the sternums are mixed with
Table 3 organs (Fig. 4a), and it is difficult to recognize the boundaries of
Test error of the proposed pooling with different p values at different datasets. the sternums from the surrounding organs directly using gray scale
Value of p Datasets gradient. The results based on the average-pooling reveals that
MNIST CIFAR-10 CIFAR-100 SVHN
the learning system could recognize and extract the sternums, al-
though the redundant background information is preserved, lead-
1 0.43% 9.80% 41.28% 2.10%
3 0.41% 9.77% 40.75% 1.98%
ing to inaccurate recognition of sternums boundaries (solid arrow
5 0.42% 9.79% 40.80% 2.06% in Fig. 4b). However, since the sternums are brighter than other or-
10 0.44% 9.82% 41.33% 2.14% gans in the image, the foreground information is more prominent
50 0.44% 9.81% 41.72% 2.18% than background information. This results in much better recog-
100 0.45% 9.86% 41.89% 2.19% nition performance of sternums boundaries using max-pooling
than that of average-pooling; but noise has also been increased as
error foreground information are also highlighted (dashed arrow in
which not only help to present the foreground objects and textures, Fig. 4c). Using our proposed pooling approach, the foreground and
but also preserve some necessary background information. In com- background details have been preserved more clearly, which leads
parison, the Tree+Max−Avg method showed better performance to a more accurate boundary recognition with less noise (Fig. 4d).
than the proposed pooling, suggesting that the combination of This demonstration shows that the proposed pooling provides an
max-pooling and average pooling with different algorithms, such improvement over classical pooling methods by eliminating noise
and improving boundary recognition accuracy.
as gated max–avg pooling in the Tree+Max−Avg and our proposed
pooling, could gain better accuracy than single pooling (either 4. Conclusions and future work
max-pooling or average pooling). If the pooling is also involved
with other pooling method, such as tree pooling, the performance A novel sparsity-based stochastic pooling mechanism for con-
of pooling could be improved as well. In this regard, our proposed volutional networks that appears to improve recognition accuracy
pooling could be further improved when combined with other has been introduced in this work. It could flexibly integrate the
advanced pooling method. advantages of max-pooling and average-pooling by optimizing the
Meanwhile, it is noticed that the recognition accuracy is influ- form and value of representative feature value, and preserve the
enced by the feature parameter p since it could affect probability advantage of stochastic pooling by using the method of weighted
weights assignment of activations by changing the value of R. Thus, random sampling (WRS) with a reservoir for sampling process.
the influence of the variation in the value of p on test errors with A probability weights assignment mechanism of activations is
proposed by measuring the closing degree between them and
different datasets has also been discussed, as shown in Table 3.
representative feature value in normal distribution. And the repre-
The results show that when p = 3, it reaches the best recog-
sentative feature value is obtained based on the degree of sparsity
nition accuracy during training with these datasets. After that, the of activations. The advantages of this pooling method are evaluated
test error increases with the increasing of p. The curved shape of on four benchmark datasets. The results show that the proposed
Fp (α) at p = 3 (blue curve, of Fig. 1) demonstrated an optimized pooling method works well under deep learning framework and
R. It also could be concluded that even in the case of p = 100 exhibits good performance on improving recognition accuracy,
(a very large value), the proposed pooling still performs better which makes it a promising method in image recognition of au-
than those of max-pooling and average-pooling (compared with tomatic geometry modeling for FEM and 3D printing.
the corresponding values in Table 2). This result could be attributed It is also noticed that the feature parameter p is a manually
to the different sampling methods between the proposed methods set value by statistical results in the present work. As mentioned
and the max-pooling and average-pooling. Although R degenerates above, the curvilinear shape of control function is highly affected
into either maximum or average value when p = 100, the sampling by this feature parameter, which could change the tendency of R
method for this proposed pooling still adopts the advantages of between max-pooling and average pooling. The foreground and
background information of a pooling region are implied in the
stochastic sampling, which promotes the recognition accuracy.
values of activations. Thus, whether or not to highlight the advan-
And in the case of linear distribution of R (p = 1), the recognition
tages of max-pooling or average pooling is determined by these
accuracy is also better than those of max-pooling and average- activations. There might be a mechanism existing between feature
pooling. It illustrates that this proposed pooling has demonstrated parameter p and activations. To improve the study, an automatic
its advantage on recognition accuracy over other pooling methods mechanism that could describe the feature parameter as a function
even in the simplified calculation algorithm case for a high compu- of activations, would be a promising direction in future work.
tational efficiency.
Acknowledgments
3.3. Visualization and practice The authors of this paper acknowledge the financial supports
of the Science and Technology Planning Project of Guangdong
The performances of different pooling methods in the deep Province (No. 2015B010125004) and the National Natural Science
learning frame of Caffe could be evaluated by assessing how well Foundation of China (NSFC No. 11602312).
Fig. 4. CT image (a) recognition and extraction with average-pooling (b), max-pooling (c) and this proposed pooling (d).
Appendix A. Supplementary data LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied
to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Lee, C. Y., Gallagher, P. W., & Tu, Z. (2015). Generalizing pooling functions in
Supplementary material related to this article can be found
convolutional neural networks: Mixed, gated, and tree. Computer Science,
online at https://dx.doi.org/10.1016/j.neunet.2018.05.015. 464–472.
Li, Z., Fan, Y., & Liu, W. (2015). The effect of whitening transformation on pooling
References operations in convolutional autoencoders. EURASIP Journal on Advances in Signal
Processing, 37, 1–11.
Lipovetsky, S. (2009). PCA and SVD with nonnegative loadings. Pattern Recognition,
Ahmed, S. E. (1995). A pooling methodology for coefficient of variation. Sankhyā,
42(1), 68–76.
57(1), 57–75.
Michalewicz, Z. (1994). Genetic algorithms + data structures = evolution programs.
Boureau, Y., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for
Computational Statistics & Data Analysis, 24(3), 372–373.
recognition. Computer Vision and Pattern Recognition, 26(2), 2559–2566.
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits
Boureau, Y. L., Ponce, J., & LeCun, Y. (2010). A Theoretical analysis of feature pool-
in natural images with unsupervised feature learning. In Nips workshop on deep
ing in visual recognition. International Conference on Machine Learning, 32(4),
learning & unsupervised feature learning (pp. 1-9).
111–118. Shi, Z., Ye, Y., & Wu, Y. (2016). Rank-based pooling for deep convolutional neural
Cao, B., Li, J., & Zhang, B. (2015). Regularizing neural networks with adaptive local networks. Neural Networks, 83, 21.
drop. In 2015 international joint conference on neural networks (pp. 1–5). IEEE. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014).
Dan, K. (1996). A singularly valuable decomposition: The SVD of a matrix. College Dropout: a simple way to prevent neural networks from overfitting. Journal of
Mathematics Journal, 27(1), 2–23. Machine Learning Research (JMLR), 15(1), 1929–1958.
Efraimidis, P. S., & Spirakis, P. G. (2006). Weighted random sampling with a reser- Sun, M., Song, Z., Jiang, X., Pan, J., & Pang, Y. (2017). Learning pooling for convolu-
voir. Information Processing Letters, 97(5), 181–185. tional neural network. Neurocomputing, 224, 96–104.
Gu, R., Xiong, W., & Li, X. (2015). Does the singular value decomposition entropy Wang, L., Gao, C., Liu, J., & Meng, D. (2017). A novel learning-based frame pooling
have predictive power for stock market? —Evidence from the Shenzhen stock method for Event Detection. Signal Processing, 140, 45–52.
market. Physica A Statistical Mechanics & Its Applications, 439, 103–113. Wu, H., & Gu, X. (2015). Towards dropout training for convolutional neural net-
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data works. Neural Networks the Official Journal of the International Neural Network
with neural networks. Science, 313(5786), 504. Society, 71(C), 1–10.
Iosifidis, A., Tefas, A., & Pitas, I. (2015). DropELM: Fast neural network regularization Xie, L., Tian, Q., Wang, M., & Zhang, B. (2014). Spatial pooling of heterogeneous fea-
with Dropout and DropConnect. Neurocomputing, 162, 57–66. tures for image classification. IEEE Transactions on Image Processing. A Publication
Jia, Yangqing, Shelhamer, Evan, Donahue, Jeff, Karayev, Sergey, Long, & Jonathan, of the IEEE Signal Processing Society, 23(5), 1994–2008.
(2014). Caffe: Convolutional Architecture for Fast Feature Embedding. Eprint Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using
Arxiv, pp. 675–678. sparse coding for image classification. In IEEE computer society conference on
Krizhevsky, A. (2012). Learning Multiple Layers of Features from Tiny Images. Tech computer vision and pattern recognition (pp. 1794-1801).
Report, pp. 1–60. Yu, D., Wang, H., Chen, P., & Wei, Z. (2014). Mixed pooling for convolutional neural
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. networks. In 9th international conference on rough sets and knowledge technology,
(1989). Backpropagation applied to handwritten zip code recognition. Neural Vol. 8818 (pp. 364-375).
Computation, 1(4), 541–551. Zeiler, M. D., & Fergus, R. (2013). Stochastic pooling for regularization of deep
LeCun, Y., Boser, B., Denker, J. S., Howard, R. E., Habbard, W., Jackel, L. D., et al. (1990). convolutional neural networks. Computer Science, 1–9.
Handwritten digit recognition with a back-propagation network. Advances in Zhang, T. (2004). Solving large scale linear prediction problems using stochastic
Neural Information Processing Systems, 396–404. gradient descent algorithms, pp. 919–926.

2018 - Song - A Sparsity-Based Stochastic Pooling Mechanism For Deep Convolutional Neural Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2018 - Song - A Sparsity-Based Stochastic Pooling Mechanism For Deep Convolutional Neural Networks

Uploaded by

Copyright:

Available Formats

Neural Networks 105 (2018) 340–345

Contents lists available at ScienceDirect

A sparsity-based stochastic pooling mechanism for deep

1. Introduction Several popular pooling methods, such as max-pooling,

To describe the singular of a matrix, the relationship between

Fig. 2. The flow chart of sparsity-based stochastic pooling.

3.1. Experiments setup

The experiments of this proposed sparsity-based stochastic

Table 2 task-related feature information is preserved while removing irrel-

You might also like