Professional Documents
Culture Documents
Auto CNN
Auto CNN
Abstract—Designing an optimum Convolutional Neural Net- in nature. Another molecular biology and genetics inspired ap-
works (CNN) is a complex task due to a large array of possible proach in [6] explored random sampling based approach with
architectures and requires experience and in-depth knowledge high-throughput. The biological inspired approaches proved
of deep learning. This paper proposes Autonomous Convo-
lutional Neural Networks (AutoCNN1 ), a novel non-heuristic that automated determination of network architecture is fea-
data-driven method to determine the CNN architecture for sible. Bayesian optimization strategies have also been used
various classification problems. Novel convolution stage growing for automatic determination of network architecture and their
and filter pruning strategies are proposed in this paper to hyper-parameters [7], [8], however, their performance were
sequentially evolve and to optimize the network architecture unable to match that of the hand-crafted networks.
based on the input data distribution. Further, an early stopping
criteria is introduced in AutoCNN to prevent over-training and Recently, deep network architecture determination studies
to minimize performance loss. The AutoCNN was evaluated have primarily focused on meta-heuristic based approaches as
using the MNIST, MNIST-rot-back-image, Fashion MNIST and they closely mimic the popular stochastic selection criteria.
the ADHD200 datasets. The results indicate that AutoCNN not The meta-heuristic architecture determination methodologies
only achieves an improved classification performance over the have outperformed the hand-designed architectures on multiple
existing evolutionary learning-based architecture determination
methodologies, but also provides a state-of-the-art classification classification tasks. Genetic algorithm-based meta-heuristic
accuracy on the MNIST-rot-back-image (2.07% improvement) approaches to identify CNN architectures have achieved im-
and on the ADHD200 dataset (4.36% improvement). The ablation provement in automatic determination of network architecture
studies prove that AutoCNN performance is reliable, highly [9], [10], [11], but have failed to beat the hand-crafted network
robust to noise, the results are highly generalizable across various based classifiers.
datasets and the proposed data-driven strategies can also be used
to improve existing CNN architectures. With the advances in computational infrastructure and size
of the databases, reinforcement learning-based approaches
Index Terms—Evolving Intelligent Systems, Convolution Neu-
have gained popularity for their ability to utilize the action-
ral Networks, deep learning, evolutionary computing
reward mechanism to better explore the search space. Neural
Architecture Search (NAS) [12] is one of the popular rein-
I. I NTRODUCTION forcement learning methods that proposed a common learning
[16]. Although such search based reinforcement learning algo- back-image and ADHD200 datasets, demonstrating the
rithms have been relatively successful in classification prob- reliability and scalability of AutoCNN architectures.
lems with large dataset, automatic architecture determination • Compared to existing methods in literature the classifica-
has been a challenging task and needs to be validated over a tion performance of AutoCNN is least affected by noise
wide range of datasets. Specifically for classification problems in the input data.
with small datasets, the search based architecture learning • AutoCNN adopted as an optimization tool for existing
strategies are prone to overfitting due to large search space and CNN architectures provides a 10.96% accuracy gain over
limited sampling points. Therefore, there is a need to develop the the baseline model.
automatic architecture determination methodologies that are The rest of this paper is structured as follows: Section
reliable and scalable across various classification problems II outlines the problem formulation; Section III discusses
with different feature distributions, number of classes and the learning policy of AutoCNN; Section IV elaborates our
small sample sizes. experiments; concluding remarks are drawn in the last section.
Hence, data-driven approaches have gained prominence for
the determination of optimal network architecture. They have
been widely used to optimize the Multi-Layer Perceptron II. P ROBLEM F ORMULATION
(MLP) [17], [18] and for Recurrent Neural Networks (RNN) CNNs have shown remarkable dominance in machine learn-
[19] architectures for data stream problems. They propose ing tasks, such as image classification and handwriting recog-
a demand based network evolution strategies to iteratively nition. However, the design of CNN architectures for particular
evolve deep neural networks. Unlike the search based method- tasks are extremely complex, as evident from the existing
ologies, the data-driven approaches are scalable and can efforts made by researchers such as AlexNet [20], ResNet [21],
be adopted to address problems with both small and large LeNet [22], VGGNet [23] and GoogLeNet [24]. Furthermore,
datasets. Also, as the network architecture evolution is need as the architecture varies greatly dependent on the problem,
based in data-driven approaches, they are computationally the CNN architecture needs to be fine-tuned for each problem.
inexpensive when compared to the NAS based reinforcement Redesign of the CNN architecture is a time intensive task and
learning methodologies. Motivated by the findings of the above requires expert knowledge. Automatic architecture determina-
studies, we propose an Autonomous Convolutional Neural tion reduces the design time and proposes high-performance
Network (AutoCNN) to generate high-performance architec- CNN architectures [12]. However, the existing autonomous
tures using a data-driven non-heuristic approach. architecture determination methodologies are highly reliant on
Direct application of data-driven architecture learning ap- search based strategies and are prone to overfitting due to large
proaches on CNN is challenging as - CNN is composed of a search space and limited sampling points in datasets with large
complex combination of different layers with varied function- feature space and small number of samples. To address these
ality. Wherein sequential order of arrangement of the layers problems, in this paper, we propose a data-driven learning
are critical and necessary for network performance. Therefore, strategy called Autonomous Convolution Neural Network (Au-
in this paper, a novel data-driven architecture learning method toCNN) to determine the optimal CNN architecture based on
hereby called the Autonomous CNN (AutoCNN) is proposed the distribution and complexity of the data. Mathematically the
to obtain high-performance CNN architectures. AutoCNN is architecture determination problem can be defined as: given a
an end-to-end image classification approach whose network batch of image data B, the AutoCNN learning policy should
architecture has been determined using a data-driven evolving be able to automatically determine a CNN model F (.) capable
strategy. The evolving strategy of the AutoCNN is governed of associating the input sample X to its corresponding class
by a novel feature separability score, which is reliant on the label Y .
distribution and complexity of the training data. Further, a filter
pruning strategy has been introduced in AutoCNN to eliminate
the redundant information and to reduce the computational III. AUTO CNN: AUTONOMOUS C ONVOLUTION N EURAL
complexity of the CNN architecture. Finally, an early stopping N ETWORK
strategy is proposed in AutoCNN to prevent overtraining and Autonomous Convolutional Neural Networks (AutoCNN)
achieve the highest classification performance. is a deep convolutional neural network, with a data-driven
The significant contributions of this paper are: architecture evolving strategy. The structural evolution of the
• AutoCNN is a novel data-driven architecture genera- AutoCNN is illustrated in Figure 1, whereas the learning
tion methodology. The convolution layer growing, filter policy is presented in Algorithm 1. From the algorithm, it
pruning and early stopping strategies of the AutoCNN can be observed that the learning process consists of three
are developed to automatically evolve the network and stages. Training stage I, a spatial feature separability opti-
determine the best CNN architecture. mization stage, consists of the structural evolution strategies
• A novel Feature Separability Score (F SS) is proposed to (convolution layer growing and filter pruning methodologies).
measure the intra-network feature separability. Training stage II aims to optimize the classifier/decision layers
• The newly developed method has been tested on four pub- (fully connected layers), whereas training stage III optimizes
licly available MNIST, MNIST-rot-back-image, Fashion- AutoCNN in an end-to-end manner. Separately optimizing
MNIST and ADHD200 datasets. AutoCNN achieves CNN layers and fully connected layer parameters are designed
state-of-the-art classification performance on MNIST-rot- to reduce the risk of overfitting.
3
Linearization layer
Input: Training data
Require: 0 < α < 1 and 0 < β < 1
Initialize: add a convolutional stage
Ensure: ST AT E ∈ {T raining, StopT raining}
=========**Training Stage I**=========
while STATE is Training do
Linearization layer
if Equation 3 is satisfied then
Execute: Filter pruning via Algorithm 2
if similarF ilterList is empty then
Create: a new convolutional stage
else
if Equation 7 is satisfied then
ST AT E ← StopT raining
Linearization layer
end if
end if
end if
Update: CNN layers parameters
end while Convolutional
Pruned Filters
stage
=========**Training Stage II**=========
ST AT E ← T raining (a) State t0 : AutoCNN initialization, State t1 : Convolutional layer growing,
while STATE is Training do State t2 : Filter pruning
if Equation 7 is satisfied then
ST AT E ← StopT raining
end if
Batch Norm.
Max Pooling
Conv. Layer
Update: fully connected layer parameters
ReLU
end while
=========**Training Stage III**=========
ST AT E ← T raining
while STATE is Training do
if Equation 7 is satisfied then (b) A convolutional stage
ST AT E ← StopT raining
end if Fig. 1. Schematic diagram of AutoCNN illustrating its evolution strategies.
Update: End-to-end network parameters
end while
Return: Network structure, Predicted labels connected layers estimates the class conditional probabilities
from the extracted features to classify the input image [25].
Further, to protect AutoCNN from overfitting, a dropout mech-
Figure 1 illustrates AutoCNN network evolution, where a anism is adopted in the learning process.
network structure can be automatically constructed for a given
problem. It illustrates the convolutional layer growing and fil-
ter pruning mechanisms to optimize the spatial features of the B. Convolutional Layer Growing
AutoCNN network. The AutoCNN is initiated with a primitive AutoCNN optimizes the number of convolutional stages
CNN network consisting a one convolution stage and one fully of CNN using a demand-based sequential strategy over the
connected layer. The AutoCNN adopts an iterative growing training epochs. The evolution of the convolutional stages
approach, wherein the architecture complexity is iteratively in AutoCNN is governed by the Feature Separability Score
modulated over the training epochs based on the network (F SS). It measures the similarity of the average spatial feature
learning performance. Novel data-driven convolutional layer F across the different classes and is calculated as:
growing criteria and filter pruning criteria are proposed to X C X C
automatically evolve the primitive AutoCNN network and 2
F SS = Fi · Fj ∀j < i (1)
achieve the best classification performance. Further, to prevent C 2 − C i=1 j=1
over fitting a self-stopping criteria is proposed to achieve early
stopping. A detailed explanation of the evolving strategies are where C represents the total number of classes and Fi ∈ <1×D
provided below. denotes the average spatial features from the last convolutional
stage calculated over all the samples of the ith class. The
term (2/(C 2 − C)) is includedP P to get the average value of
A. AutoCNN Initialization all similarity coefficients ( Fi · Fj ). F SS ∈ [0, 1] repre-
The AutoCNN is initialized with a single convolution stage sents the correlation between the high order spatial features
(1 convolution layer, 1 batch normalization layer, ReLU non- extracted from the convolutional stages from the samples of
linear activation function and 1 pooling layer), illustrated in the ith and j th classes. A higher F SS value denotes that
Figure 1 (b), with K filters, followed by a fully-connected the spatial features extracted by the convolutional stages are
layer with the C nodes. Wherein, C represents the number of strongly correlated and poorly separable, whereas a low F SS
classes in the input dataset. Each convolution stage converts denotes that the spatial features are loosely correlated and
the low level features into high level features, while the fully- highly discriminative.
4
Although the F SS provides accurate measures of the Algorithm 2 Filter pruning algorithm
extracted features separability, it is susceptible to local os- Input: Current network structure
cillations due to large parameter size and iterative mini- Set: similarF ilterList as an empty array
for all pairs of vectorized feature maps do
batch training in CNNs. Moving average scores have been Calculate: Pearson correlation coefficient ρ(Zp , Zq )
successfully used in time series based studies to effectively end for
dampen the variations due to local oscillations [26] while Calculate: µK and σK
preserving the information in the data. Hence, the moving Identify: similar filters via Equation 5
average scores of the F SS are calculated as: Store: the index of similar filter to similarF ilterList
if similarF ilterList is not empty then
µnF SS = µn−1 n−1
F SS + (F SS − µF SS )/n (2) for all filters in similarF ilterList do
Prune: filters having low contribution via Equation 6
where F SS is the spatial features similarity score of the nth end for
epoch. The above equations give the moving average F SS end if
Return: Network structure, similarFilterList
(µnF SS ) for the nth epoch which is used to estimate the
performance of the network.
For a randomly initialized network, µnF SS reduces with
C. Filter Pruning
the number of epochs and achieves a steady state. Further
improvement in performance can be achieved by deriving Although the addition of convolution stages using AutoCNN
higher-order features using additional convolutional stages. convolution layer growing strategy ameliorates the learning
Therefore, the steady-state response of the µnF SS over W potential of the network, it also increases the redundancy in the
epochs indicates the saturation in the learning of the current network leading to loss of performance. Pruning of redundant
network and is utilized as AutoCNN convolutional stage filters not only improves the classification performance but
growing criteria, given as: also reduces the space and the computational complexity
"
(n−W )
# of the network and results in faster execution times [27].
µnF SS − µF SS Moreover, apriori determination of the number of filters for
0< <β (3)
µmax min
F SS − µF SS the new convolution stage of the CNN is not feasible due
to the complex feature space relationship. Therefore, the
where µmax min
F SS and µF SS represent the maximum and minimum addition of K randomly initialized filters in the newly added
n
observed µF SS value for the given architecture complexity and V th convolution stage of the AutoCNN does not lead to
0 < β < 1 represents the growth sensitivity factor. A larger significant improvement in the classification performance. To
β enforces an aggressive growing strategy, while a β value address these limitations a data-driven filter pruning strategy is
closer to zero promotes a conservative growing strategy. proposed in AutoCNN to increase network learning potential
The iterative addition of convolutional stages although without loss in classification performance.
improves the separability of the features, multiple pooling Firstly, a significantly large K is chosen to build in re-
layer in deep networks leads to diminishing feature size and dundancy in the learnable parameters. Further, the filter pa-
loss of information [3]. To prevent the loss of discriminatory rameter distribution is monitored to identify filters with simi-
information, AutoCNN uses predictive analysis to determine lar/identical parameter distribution. Identical filters applied to
the resultant reduction in the number of features. It arbitrates common inputs are responsive to similar spatial patterns and
the feasibility of the addition of the pooling layer. For an input produce similar information for the next layer. Therefore, it is
of D dimensions, the number of features M of an AutoCNN imperative that redundant information is eliminated and only
with V convolutional stages can be calculated as: relevant discriminative information is retained in deep learning
D
Y models [28].
(dn /(Pn Sn ))V +1
M= (4) In this paper, we propose a Pearson’s product-moment
n=1 correlation-based strategy to identify the convolution filters
where, dn , Pn and Sn represent the size of the input, the with redundant information. Highly similar convolution filters
number of padding pixels and the stride in the nth dimension, (Kp , Kq ) with redundant information are determined using
respectively. From the observation of prominent CNN archi- the linear correlation between their vectorized filter maps Z.
tectures, such as AlexNet [20], ResNet [21], VGGNet [23] Two filters (Kp , Kq ) are considered similar if their vectorized
and GoogLeNet [24], it is found that the number of extracted feature maps (Zp , Zq ) possess a strong positive linear rela-
CNN features are ranging from 1000 to 4000. In this study, tionship, and are said to be dissimilar if they have negative or
we consider 4000 as the minimum number of features that no linear relationship. Therefore, the pair of filters (Kp , Kq )
allows AutoCNN to include max pooling layer whenever a with high similarity are identified using:
convolutional stage is added. Equation 4 is used to estimate the
0 < ρ(Zp , Zq ) ≥ µK + 4σK p 6= q (5)
number of spatial features of the AutoCNN after the addition
of the new convolution stage. If the addition of the new where ρ(Zp , Zq ) represents the Pearson’s correlation of the
convolution stage reduces the number of spatial features by vectorized feature maps of the pth and q th filters of a convolu-
M ≤ 4000, AutoCNN uses a convolutional stage devoid of tional stage, µK and σK are the mean and standard deviation
the pooling layer to prevent the deterioration of information of all the positive correlated filters of a given convolutional
due to diminishing spatial feature dimension. stage respectively. Equation 5 is inspired by the k-sigma rule
5
100 0.7 10 70
0.6 60
80 8
no of Conv. Layer
0.5
Accuracy (%)
60 6
40
Conv layer 1
FSS
0.4 Conv layer 2
30 Conv layer 3
40 4
Conv layer 4
0.3
20 Conv layer 5
Conv layer 6
20 2
0.2 Conv layer 7
Testing 10
Training Conv layer 8
Conv layer 9
0 0.1 0 0
200 400 600 800 200 400 600 800 200 400 600 800 200 400 600 800
no of epoch no of epoch no of epoch no of epoch
0.9 120
10
80
no of Conv. Layer
8 Conv layer 1
Accuracy (%)
60 0.7
80 Conv layer 2
FSS Conv layer 3
0.6 6
Conv layer 4
60 Conv layer 5
40
0.5
4 Conv layer 6
40 Conv layer 7
0.4 Conv layer 8
20
Testing 2 20 Conv layer 9
Training 0.3 Conv layer 10
Conv layer 11
0 0.2 0 0
200 400 600 800 200 400 600 800 200 400 600 800 200 400 600 800
no of epoch no of epoch no of epoch no of epoch
10 Conv layer 9
60 Conv layer 10
40 0.4 Conv layer 11
Conv layer 12
40 Conv layer 13
5
Conv layer 14
20 0.2
Testing Conv layer 15
20
Training Conv layer 16
Conv layer 17
0 0 0 0
200 400 600 800 200 400 600 800 200 400 600 800 200 400 600 800
no of epoch no of epoch no of epoch no of epoch
8
0.8 30
80 7
no of Filter in Each Layer
0.75 25
no of Conv. Layer
6
Accuracy (%)
60
0.7 5 20
FSS
Fig. 2. Performance metrics and the network evolution of AutoCNN. The AutoCNN evolving mechanism is demonstrated here where it successfully introduces
several CNN layers and removes redundant filters. The effectiveness of F SS in measuring a model performance is also demonstrated here. It decreases as
the increase in accuracy.
M ODEL D EPTH PARAMS . (M) ACC . (%) M ODEL D EPTH ACC . (%)
AUTO CNN 19 ± 1.73 1.44 ± 0.26 94.42 ± 0.07 AUTO CNN 11 ± 2.64 74.72 ± 2.72
E VO CNN N/A 6.68 94.53 M ODEL I 9 75.45 ± 4.26
R ES N ET 18* 18 11 94.90 M ODEL II 2 + 4∗ 81.32 ± 5.66
VGG16* 16 26 93.50
∗: Evolved by AutoCNN. Model I: AutoCNN generated network, re-trained
*Hand designed CNN architecture. without network evolution. Model II: AutoCNN initialized with DTM,
re-trained with network evolution.
3) Fashion-MNIST: The performance of the AutoCNN on
the Fashion-MNIST dataset in comparison to other methodolo-
gies viz., EvoCNN, ResNet18 and VGG16, in the literature is TABLE VI
AUXILIARY STUDY RESULTS ON MNIST- ROT- BACK - IMAGE DATASET.
presented in Table III. A meta-heuristic genetic-based CNN,
EvoCNN [38] achieved a classification accuracy of 94.53%,
while the ResNet18 and VGG16 hand-designed architectures M ODEL D EPTH ACC . (%)
reported classification accuracies of 94.90% and 94.53% re- AUTO CNN 17.33 ± 4.72 86.78 ± 0.66
M ODEL I 18 87.76 ± 0.38
spectively. The classification performance of the ResNet18 M ODEL III 9 + 11∗ 88.75 ± 0.23
CNN classifier represents the state-of-the-art accuracy on the
Fashion-MNIST dataset[36]. The residual connections of the *Evolved by AutoCNN. Model I: AutoCNN generated network, re-trained
without network evolution. Model III: AutoCNN initialized with AutoCNN
ResNet architecture are vital to prevent loss of informa- generated network on MNIST dataset, re-trained with network evolution.
tion during learning process in datasets like Fashion-MNIST
characterized by low resolution images and large range of An ablation study simulating real-world use case scenarios
feature distribution [28]. AutoCNN achieved a classification of the proposed AutoCNN was conducted. First, the CNN ar-
performance of 94.42% on the Fashion-MNIST dataset, which chitecture for ADHD200 and MNIST-rot-back-image datasets
is comparable to the state-of-the-art methodologies in lit- were generated using AutoCNN. The generated network are
erature. However, the data-driven evolution strategy ensures re-initialized using random weights. The re-initialized Au-
that the AutoCNN network achieves comparable classification toCNN network are further retrained (without network evo-
performance with the smallest number of network parameters lution) using the individual datasets to verify the network
(21% reduction in network size), reducing the memory and performance. These re-trained AutoCNN models are hereby
space complexity. referred to as Model I. From the first and second row of
4) ADHD200: The ADHD200 dataset is characterized by Table V and Table VI, it can be seen that the classification
small sample size, large feature dimension, low intra-class performance of the Model I is comparable with the baseline
separability and high inter-class variability which presents a AutoCNN network on all three datasets. Further, statistical
challenging problem for CNN based classifiers [31]. Among analysis reveals that the reduction in classification accuracy
the CNN based approaches to classify ADHD using rs-fMRI is statistically not significant (p = 0.74) and the transfer of
9
TABLE VII the convolution layer growing, filter pruning and automatic
N ETWORK STRUCTURE PERFORMANCE ON MNIST- ROT- BACK - IMAGE . stopping strategies. The results reported on Table I, II and
in [11] show that the implementation of convolutional stage
Fold Remp (f )
AutoCNN ResNet18∗ VGG11∗ growing strategies helps AutoCNN achieves better classifica-
1 1.79 3.77 3.19
tion performance on the MNIST dataset in comparison to the
2 1.76 3.36 4.88 heuristic network determination methods in literature.
3 2.01 3.28 3.69 Moreover, the previous methods in literature use a pre-
4 1.52 3.46 4.39
5 1.86 3.52 4.21
determined number of epochs for training and hence are prone
6 1.96 3.47 4.44 to overfitting. AutoCNN proposes a novel automatic early
7 1.52 3.55 4.35 stopping criteria to prevent overfitting and ensure minimization
8 1.79 3.29 3.80
9 1.58 3.26 4.50
of computation cost. It is worth mentioning that the results
10 1.85 3.51 3.58 of the previous methodologies report only the classification
variance 0.027 0.022 0.240
accuracy on the dataset, which does not reflect the effect
max(Remp (f )) 2.01 3.77 4.88 of noise on the network performance and the generalization
max(Remp (f )) + variance 2.04 3.79 5.12 power of a network. Therefore, in this paper, AutoCNN
∗: Hand-designed CNN architecture. is evaluated for performance reproducibility (Model I), to
optimize existing CNN networks (Model II) and its ability
to adapt to introduction of noise in the dataset (Model III).
in Table I and at the same time confirms the effectiveness of The results of the above auxiliary studies indicated that the
AutoCNN learning policy in constructing a CNN structure for AutoCNN performance is scalable, adaptable, robust to noise
any given problem. and the results are highly generalizable. Therefore, AutoCNN
is a reliable tool for automatic CNN network architecture
determination for any given dataset.
E. Related Works
Automated CNN architecture determination has been widely V. C ONCLUSIONS
explored in literature. The previous methodologies like FCAE The proposed AutoCNN methodology enables to derive
[11], IPPSO [37] and EvoCNN [38] have proposed Evolution- the CNN network architecture in a data-driven approach.
ary Computation (EC) for the automatic design of the CNN It utilizes the convolution layer growing, filter pruning and
architecture. These EC methodologies for architecture determi- early stopping criteria to modulate the architecture using an
nation are inspired by biological evolution cycles, wherein the iterative strategy. A novel Feature Separability Score (F SS)
network parameters are updated using biological evolution in- is proposed to measure the spatial feature separability of
spired approaches and the intermediate solutions are evaluated the network. The performance of AutoCNN is tested us-
using quantitative measures. The above methods are heuristic ing the MNIST, MNIST-rot-back-image, Fashion-MNIST and
in nature and determination of the optimal solution requires ADHD200 datasets to ensure reliability and scalability. The
iterative exploration of the solution space over a large number results indicate that AutoCNN outperforms all the evolutionary
of trails. For deep networks with multiple layers, such heuristic learning-based architecture determination methodologies in
approaches are not feasible. Bayesian optimization strategies two of the four datasets. It also achieves the state-of-the-
have also been used for automatic determination of network art classification performance of 86.78% and 74.72% on the
architecture and their hyper-parameters [7], [8], however, their MNIST-rot-back-image and ADHD200 dataset, respectively.
performance were unable to match that of the hand-crafted In this paper, even though the proposed AutoCNN methodol-
networks. With the advent of deep learning various reinforce- ogy has been shown to optimize convolution neural network
ment learning based approaches have been proposed [12], [14]. architectures, the data-driven architecture determination strat-
The reinforcement learning based approaches is modelled as egy proposed here can be adopted for the optimization and/or
closed loop controller and child network, wherein the primary identification of any deep neural network architecture.
task of the controller is to determine the best child network
for a specific problem. As reported in [12], such approaches R EFERENCES
are computationally expensive, requiring 800 GPUs to trained
[1] P. Sermanet and Y. LeCun, “Traffic sign recognition with multi-scale
over thousands of GPU days to determine a single network convolutional networks.” in IJCNN, 2011, pp. 2809–2813.
architecture. To address these shortcomings in the previous [2] C. Pelletier, G. I. Webb, and F. Petitjean, “Temporal convolutional neural
studies, a novel data-driven approach to CNN architecture network for the classification of satellite image time series,” Remote
Sensing, vol. 11, no. 5, p. 523, 2019.
determination is proposed in AutoCNN. Further the proposed [3] G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, “Deep
AutoCNN is expensive approach (requiring 2 GPUs GeForce networks with stochastic depth,” in European conference on computer
GTX 1080 trained over 8 − 40 hours) and enables scaling vision. Springer, 2016, pp. 646–661.
[4] J. D. Schaffer, D. Whitley, and L. J. Eshelman, “Combinations of genetic
from small datasets to large datasets. AutoCNN proposes a algorithms and neural networks: A survey of the state of the art,” in
novel Feature Separability Score (F SS) that measures the [Proceedings] COGANN-92: International Workshop on Combinations
intra-network feature separability, to measure the network of Genetic Algorithms and Neural Networks. IEEE, 1992, pp. 1–37.
[5] K. O. Stanley and R. Miikkulainen, “Evolving neural networks through
capacity and determine appropriate evolution strategies. The augmenting topologies,” Evolutionary computation, vol. 10, no. 2, pp.
F SS and the training error (ET r ) are utilized to determine 99–127, 2002.
11
[6] N. Pinto, D. Doukhan, J. J. DiCarlo, and D. D. Cox, “A high-throughput [31] A. M. S. Aradhya, V. Subbaraju, S. Sundaram, and N. Sundararajan,
screening approach to discovering good forms of biologically inspired “Regularized spatial filtering method (r-sfm) for detection of attention
visual representation,” PLoS Comput Biol, vol. 5, no. 11, p. e1000579, deficit hyperactivity disorder (ADHD) from resting-state functional mag-
2009. netic resonance imaging (rs-fmri),” in 2018 40th Annual International
[7] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas, Conference of the IEEE Engineering in Medicine and Biology Society
“Taking the human out of the loop: A review of bayesian optimization,” (EMBC), July 2018, pp. 5541–5544.
Proceedings of the IEEE, vol. 104, no. 1, pp. 148–175, 2015. [32] A. M. S. Aradhya, A. Joglekar, S. Suresh, and M. Pratama, “Deep
[8] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimiza- transformation method for discriminant analysis of multi-channel resting
tion of machine learning algorithms,” Advances in neural information state fmri,” in Thirty-Third AAAI Conference on Artificial Intelligence,
processing systems, vol. 25, pp. 2951–2959, 2012. 2019.
[9] B. Wang, Y. Sun, B. Xue, and M. Zhang, “Evolving deep convolutional [33] P. Bellec, C. Chu, F. Chouinard-Decorte, Y. Benhajali, D. S. Margulies,
neural networks by variable-length particle swarm optimization for im- and R. C. Craddock, “The neuro bureau adhd-200 preprocessed
age classification,” in 2018 IEEE Congress on Evolutionary Computation repository,” NeuroImage, vol. 144, pp. 275 – 286, 2017, data
(CEC). IEEE, 2018, pp. 1–8. Sharing Part II. [Online]. Available: http://www.sciencedirect.com/
[10] J. Liang, E. Meyerson, B. Hodjat, D. Fink, K. Mutch, and R. Miikku- science/article/pii/S105381191630283X
lainen, “Evolutionary neural automl for deep learning,” in Proceedings [34] F. E. Fernandes and G. G. Yen, “Automatic searching and pruning of
of the Genetic and Evolutionary Computation Conference, 2019, pp. deep neural networks for medical imaging diagnostic,” IEEE Transac-
401–409. tions on Neural Networks and Learning Systems, pp. 1–11, 2020.
[11] Y. Sun, B. Xue, M. Zhang, and G. G. Yen, “A particle swarm [35] G. Qian and L. Zhang, “A simple feedforward convolutional conceptor
optimization-based flexible convolutional autoencoder for image classi- neural network for classification,” Applied Soft Computing, vol. 70, pp.
fication,” IEEE Transactions on Neural Networks and Learning Systems, 1034–1041, 2018.
vol. 30, no. 8, pp. 2295–2309, 2019. [36] F. Assunçao, N. Lourenço, P. Machado, and B. Ribeiro, “Denser: deep
[12] B. Zoph and Q. V. Le, “Neural architecture search with reinforcement evolutionary network structured representation,” Genetic Programming
learning,” arXiv preprint arXiv:1611.01578, 2016. and Evolvable Machines, vol. 20, no. 1, pp. 5–35, 2019.
[13] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean, “Effi- [37] B. Wang, Y. Sun, B. Xue, and M. Zhang, “Evolving deep convolutional
cient neural architecture search via parameter sharing,” arXiv preprint neural networks by variable-length particle swarm optimization for
arXiv:1802.03268, 2018. image classification,” CoRR, vol. abs/1803.06492, 2018. [Online].
[14] C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, Available: http://arxiv.org/abs/1803.06492
A. Yuille, J. Huang, and K. Murphy, “Progressive neural architecture [38] Y. Sun, B. Xue, and M. Zhang, “Evolving deep convolutional neural
search,” in Proceedings of the European conference on computer vision networks for image classification,” CoRR, vol. abs/1710.10741, 2017.
(ECCV), 2018, pp. 19–34. [Online]. Available: http://arxiv.org/abs/1710.10741
[15] H. Cai, J. Yang, W. Zhang, S. Han, and Y. Yu, “Path-level net- [39] L. Zou, J. Zheng, C. Miao, M. J. Mckeown, and Z. J. Wang, “3d
work transformation for efficient architecture search,” arXiv preprint cnn based automatic diagnosis of attention deficit hyperactivity disorder
arXiv:1806.02639, 2018. using functional and structural mri,” IEEE Access, vol. 5, pp. 23 626–
[16] H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture 23 636, 2017.
search,” arXiv preprint arXiv:1806.09055, 2018. [40] V. Vapnik, The nature of statistical learning theory. Springer science
& business media, 2013.
[17] M. Pratama, C. Za’in, A. Ashfahani, Y. S. Ong, and W. Ding, “Au-
[41] R. F. de Mello, “On the shattering coefficient of supervised learning
tomatic construction of multi-layer perceptron network from streaming
algorithms,” arXiv preprint arXiv:1911.05461, 2019.
examples,” in Proceedings of the 28th ACM International CIKM, 2019.
[18] G. bin Huang, P. Saratchandran, and N. Sundararajan, “A generalized
growing and pruning rbf (ggap-rbf) neural network for function approx-
imation,” IEEE TRANSACTIONS ON NEURAL NETWORKS, vol. 16,
pp. 57–67, 2005.
[19] M. Das, M. Pratama, S. Savitri, and Z. Jie, “Muse-rnn: A multilayer
self-evolving recurrent neural network for data stream classification,” in
19th IEEE International Conference on Data Mining, 08 2019.
[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” Communications of the ACM,
vol. 60, no. 6, pp. 84–90, 2017.
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in 2016 IEEE Conference on Computer Vision and Pattern
Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, 2016,
pp. 770–778. [Online]. Available: https://doi.org/10.1109/CVPR.2016.90
[22] Y. LeCun et al., “Lenet-5, convolutional neural networks,” URL:
http://yann. lecun. com/exdb/lenet, vol. 20, no. 5, p. 14, 2015.
[23] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[24] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2015, pp. 1–9.
[25] Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks
and applications in vision,” in Proceedings of 2010 IEEE International
Symposium on Circuits and Systems. IEEE, 2010, pp. 253–256.
[26] G. Glendrange and S. Tveiten, “Testing the performance of simple
moving average with the extension of short selling,” Master’s thesis,
Universitetet i Agder; University of Agder, 2016.
[27] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning
filters for efficient convnets,” arXiv preprint arXiv:1608.08710, 2016.
[28] J. Zhang, T. Liu, and D. Tao, “An information-theoretic view for deep
learning,” arXiv preprint arXiv:1804.09060, 2018.
[29] Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010.
[Online]. Available: http://yann.lecun.com/exdb/mnist/
[30] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image
dataset for benchmarking machine learning algorithms,” arXiv preprint
arXiv:1708.07747, 2017.