An Experimental Study On Hyper Parameters For Training Deep Convolutional Networks

Konferans Linki: https://www.ismsitconf.org/?
go=ismsit2020
Konferans Kitapçık İndirme Linki: https://ieeexplore.ieee.org/xpl/conhome/9254214/proceeding?isnumber=9254218&sortType=vol-only-seq
An Experimental Study on Hyper Parameters for

Training Deep Convolutional Networks
Hakan Temiz
Control and Automation Technologies
Artvin Coruh University
Artvin, Turkey
htemiz@artvin.edu.tr
Abstract— When training deep networks, it is crucial to values of many hyper parameters and combinations of these
obtain a network that offers optimum performance by trying values. Theoretically, by trying these parameters with
different values of many hyper parameters and combinations of numerous different values, the optimal set of values can be
these values. Theoretically, the optimal set of values, which found to find the network with maximum performance.
ensure the maximum performance of the network, can be found However, it is not feasible to test numerous different values of
by giving these parameters numerous different values. However, many training parameters one by one, and measure the
it is not feasible to try all combinations of values. The a priori performance of the network. For this reason, it is an important
information regarding the contribution levels of hyper issue to implement training procedure with optimal set of the
parameters and their values to the performance of the network
hyper parameters and their ideal values that contribute the
will narrow the search space and enable researchers to easily
and quickly obtain the network with optimum performance. In
most to the performance of the network. Commencing the
this study, a priori information is investigated that will guide in training procedure with these ideal values narrows the solution
searching for most important hyper parameters and their ideal space, and thus, facilitates and speeds up the finding the
values that ensure optimum performance of a typical network with optimum performance.
convolutional neural network in single image super resolution. In this study, the most important parameters that make the
For this purpose, the importance levels of the 5 most commonly performance of a typical deep convolutional network superior
used hyper parameters in training, and their optimum values in obtaining super resolution from single image are
were investigated. By giving two different values that are widely
experimentally investigated. In this context, the effects of 5
used or known to give good results from previous works in the
literature for each hyper parameter, in total, 32 different
commonly used hyper parameters (learning rate,
training procedure were performed. The results showed that the normalization, input (patch) size, batch size, and stride) on the
learning rate has the most important effect on the performance performance of deep network has been observed
of the network, then normalization, and then the size of the input experimentally. 2 different values were taken for each of these
image given to the model during training. It has also been found five parameters. Totally 32 different training was carried out
that the batch number and step count parameter values do not with EDSR [14] deep network, which is the winner of the
make a significant change in the performance of the network. NTIRE 2017 [15] competition. The performances of these 32
The results obtained from this study could help researchers in models were benchmarked with eight different image quality
determining the training parameters and their values in order assessment measures, and also checked in terms of human
to efficiently and rapidly obtain optimum network performance. visual system. Results provided very important information
about which parameters and their values should be taken in the
Keywords—deep learning, convolutional neural network, training of deep networks. It has been shown that the learning
hyper parameter, training rate has the greatest effect on the performance of the network.
The second most important parameter is the normalization
I. INTRODUCTION method of the input image, and the third most important
A rigorous effort is being made to design deep neural parameter is the size of training image patch. When the
networks with higher performance than other methods and learning rate value is 1E-4, input image is normalized in the
techniques. In order to increase the performance of deep range of [0,1] and the size of the input image is 48x48 pixels,
convolutional networks, different architectures such as the performance of the network reaches its peak. On the other
shortcut connection [1][2], densely connection [3][4], residual hand, when the learning rate value is 1E-3, input image is
learning [5][6][7][8], dense iterative block and double normalized in the range of [-1,1] and the input image size is
upsampling layer [9], and so on, have been proposed and used 24x24 pixels, the network reaches the lowest performance. It
successfully. The factors that affect the success of networks is also observed from the experiments that the change in the
are not just the architecture of the network. Hyper parameters value of the batch and stride hyper parameters does not make
used in training also directly affect the performance. The a significant difference in the performance of the network.
researchers did not only focus on architectures to improve Experiments provide very important information
performance, but also examined the hyper parameters used in regarding choosing parameter values in training of
training the network. For example, some of them took high convolutional neural networks. It has been demonstrated that
value of learning rate [3][7][8][10], whereas some took small the small learning rate value offers more successful results
value [11][12][13]. Examples of some of other approaches are than the large learning rate value, normalization in the range
gradient clipping [7], using different input image size, number of [0,1] than normalization in the range of [-1,1], and,
of batches, frequency of sample patches from the training similarly, the large size input image than the small size input
image (stride value), etc. image. Choosing the value set of the hyper parameters in
In the training of deep networks, it is necessary to attain a training of deep networks in line with this information will
network that offers optimum performance by trying different facilitate researchers to easily and quickly obtain the optimum
network.
978-1-7281-9090-7/20/$31.00 ©2020 IEEE
Authorized licensed use limited to: ULAKBIM-UASL - Artvin Coruh Universitesi. Downloaded on June 08,2023 at 11:20:47 UTC from IEEE Xplore. Restrictions apply.
II. MATERIAL AND METHODS values in Table II. The learning rate 1E-04 results in better
performance than 1E-03. The normalization with min-max
A. Training and Test [0,1] yields better performance than min-max [-1,1]. The input
With this study, we aimed at discovering the optimal value (patch) size 48x48 yields better performance than 24x24.
set of training hyper parameters that leads EDSR to achieve We provide Fig. 1, clearly showing the contributions of the
the best performance in single image super resolution for scale hyper parameter values to the performance of EDSR in each
2. For this purpose, we took two different values for each of test set, in terms of each IQM. Each IQM’s graphic proofs that
the following 5 training parameters: learning rate, the EDSR exhibits superb performance when learning rate is
normalization, input (patch) size, batch size, and stride. As a 1E-04, normalization method is min-max [0,1] and input size
result of the experiment, we found the optimal parameter set is 48, whereas it obtains the worst performance when learning
that we aimed to find. rate is 1E-03 and normalization is min-max [-1,1].
The binary values of each hyper parameter are We provide also Figs. 2 to 4 for a qualitative comparison
summarized in Table I. We obtained a total of 32 different of the performance of 32 EDSR models in ‘0880.png’ in
EDSR models by training it with all possible combinations of DIV2K test set, ‘z_bird_GT.bmp’ in SET5 and in
these 5 training parameters. ‘comic.bmp’ in SET14, respectively. In each figure, we
present the same region extracted from the output image
TABLE I. TRAINING PARAMETERS WITH TWO ALTERNATIVE VALUES produced by each EDSR model from the same input image.
FOR EACH. We also show on the left the ground-truth image and its patch
of the same region for reference. The images in each figure are
Hyper Parameter Values arranged in the same order as in Table III. They are ordered
Learning rate 1E-03 1E-04 from left to right first and then top to bottom. The letters L, N,
Normalization Min-Max [0, 1] Min-Max [-1, 1] I, B, and S are the abbreviations of learning rate,
Input patch size 24 48
Number of batches 16 32 normalization, input size, batch size and stride, respectively.
Stride 13 23
B. Results TABLE II. THE CONTRIBUTION OF HYPER PARAMETER VALUES TO THE

PERFORMANCE OF EDSR.
We measured the performance of each trained network
Hyper Parameter Positive Negative
with the following eight IQMs: PSNR, SSIM [16], ERGAS,
PAMSE [17], SAM [18], SCC, UQI [19], and VIF [20]. We Learning rate 1E-04 1E-03
present the results from each test set in Table III. The table is Normalization Min-Max [0, 1] Min-Max [-1, 1]
Input patch size 48 24
sorted by PSNR scores obtained from DIV2K test set in the Number of batches No noticeable difference between 16 and 32
manner that the most successful value to be at the top. Stride No noticeable difference between 13 and 23
According to this ordering, of the top 8 scores of each IQM,
we highlighted in light-green only those take place in the first From a careful visual inspection of the images in each
eight lines. Similarly, of the last 8 scores of each IQM, the figure, it is easily seen that they have very similar success
ones take place in the last eight lines were highlighted in light- ranking as in Table III. The changes in color saturations can
orange. easily be seen in all figures. It is even more obvious in Fig. 2.
12 out of 16 EDSR models in the first half of the table were Very few images have the same color saturation as the ground-
trained with learning rate 1E-04, whereas 12 out of 16 in the truth. Other images look either yellowish, bluish or reddish.
latter half were trained with learning rate 1E-03. The top 7 The resolution obviously decreases as going from top to
models have learning rate 1E-04, whereas the last 6 models bottom. The images in the last rows in all figures hold the
have 1E-03. According to the results obtained according to poorest resolution, whereas the images in the top rows
both values of normalization and input size parameters, there preserve the highest. In all figures, some images have a
is no significant difference. 11 out of 16 models in the first number of corrupted pixel regions. It is easily seen especially
half of the table were trained with min-max [0,1] in the pupil region in Figs. 2 and 3. In Fig. 3, the same images
normalization and input size 48, whereas the 11 out of 16 with degraded pixel regions in the pupil region also have
networks in the latter half holds min-max[-1,1] normalization similar degradations in the dark regions above the eye. A
and input size 24. The top 6 and 4 models were trained with significant number of images in Fig. 4 also have similar color
min-max [0,1] normalization and input size 48, respectively. corruptions. By carefully examining such images in all
The worse 8 models have min-max [-1,1] normalization. On figures, we see that the models producing such images with
the other hand, there is no noticeable difference in training corrupted pixel regions were trained with any combination of
with either values of batch size, and stride hyper parameters. the following hyper parameter values: learning rate 1E-03,
In other words, neither the batch size values (16 or 32) nor the min-max [-1,1] normalization and input size 24. In other
stride values (13 or 23) contribute to the performance better words, training EDSR with these hyper parameter values
than the other. negatively affects the performance. The opposite is also true.
92 of the total 96 best scores measured by all IQMs in the Training with any combination of alternate values of these
top four rows reveal the optimal combination of hyper three hyper parameters positively affects the performance. As
parameter values. That is, the EDSR achieves the best a result, from the quantitative evaluation of the result images
performance when learning rate is 1E-04, normalization is with all IQMs, we reached the same conclusion presented in
min-max [0,1] and input size is 48. The worst combination of Table II. As a result of this experiment, we finally found the
hyper parameter values is obtained in the case when learning most optimal hyper parameter set ensuring the best
rate is 1E-03 and normalization is min-max [-1,1]. We performance.
summarize the contributions offered by the hyper parameter
TABLE III. Performance of EDSR in DIV2K, SET5 and SET14 test sets according to binary values of 5 hyper parameters. Table is sorted by PSNR score in DIV2K test set in descending order. Of the top 8 and last
8 scores of each IQM, only those in the first 8 and last 8 lines are highlighted in light-green and light-orange, respectively. The best performance achieved when learning rate is 1E-04, normalization is min-max [0,1] and
input size is 48. The worst performance achieved when learning rate is 1E-03 and normalization is min-max [-1,1].
PSNR SSIM ERGAS PAMSE SAM SCC UQI VIF
No
Stride
L. Rate
Min-Max
Input Size
Batch Size
DIV2K SET5 SET14 DIV2K SET5 SET14 DIV2K SET5 SET14 DV2K SET5 SET14 DIV2K SET5 SET14 DIV2K SET5 SET14 DIV2K SET5 SET14 DIV2K SET5 SET14
1 32 13 30.827 32.820 27.401 0.912 0.943 0.880 10673.235 3456.317 16381.651 13.139 4.101 44.484 0.079 0.056 0.094 0.480 0.577 0.489 0.982 0.991 0.984 0.583 0.692 0.553
2 16 23 30.471 33.352 27.928 0.911 0.941 0.877 10362.745 3458.802 13951.992 15.254 4.587 57.198 0.080 0.049 0.090 0.476 0.574 0.491 0.980 0.988 0.984 0.581 0.692 0.560
48
3 16 13 30.359 34.002 27.475 0.910 0.944 0.880 12893.456 3284.327 19181.921 16.346 2.440 43.848 0.082 0.048 0.093 0.477 0.579 0.486 0.978 0.991 0.984 0.582 0.697 0.553
[0, 1]
4 1E-04 32 23 29.909 32.911 26.611 0.909 0.940 0.871 13785.366 3530.490 14340.833 21.647 4.718 64.005 0.086 0.053 0.105 0.473 0.565 0.467 0.983 0.990 0.983 0.575 0.681 0.536
5 24 16 13 28.850 29.946 26.945 0.892 0.919 0.862 18010.408 5839.066 10694.298 28.432 22.535 48.351 0.078 0.070 0.091 0.430 0.509 0.449 0.968 0.975 0.978 0.535 0.621 0.515
6 24 32 13 28.798 29.596 26.302 0.898 0.928 0.852 11204.811 4941.039 9395.317 26.896 23.939 138.124 0.085 0.074 0.100 0.435 0.525 0.448 0.972 0.984 0.970 0.552 0.644 0.516
7 [-1, 1] 48 16 23 28.369 27.965 22.802 0.895 0.919 0.853 11717.751 6214.693 27016.871 637.643 120.811 501.895 0.102 0.066 0.143 0.469 0.545 0.469 0.964 0.978 0.958 0.574 0.661 0.536
8 1E-03 [0, 1] 24 16 23 27.985 29.624 25.245 0.867 0.903 0.847 28411.524 8071.384 16271.284 37.060 21.243 79.848 0.092 0.079 0.113 0.388 0.447 0.408 0.945 0.954 0.965 0.530 0.627 0.502
9 1E-03 [0, 1] 48 32 23 27.938 28.357 24.796 0.884 0.921 0.855 29651.707 5187.078 25015.928 39.547 27.515 90.017 0.111 0.102 0.125 0.419 0.512 0.433 0.966 0.980 0.975 0.525 0.635 0.501
10 1E-04 [-1, 1] 48 32 13 27.904 26.919 21.908 0.893 0.903 0.825 9979.532 7806.247 18650.970 724.044 343.109 491.400 0.111 0.094 0.156 0.470 0.544 0.419 0.965 0.969 0.954 0.573 0.661 0.470
11 1E-04 [-1, 1] 24 32 23 27.853 26.428 23.921 0.882 0.891 0.835 16882.306 9175.610 12862.843 297.894 153.810 398.117 0.093 0.087 0.116 0.415 0.483 0.445 0.960 0.953 0.952 0.557 0.633 0.500
12 1E-04 [-1, 1] 48 32 23 27.705 27.837 23.081 0.893 0.923 0.852 12973.284 5954.958 22994.894 660.860 103.825 488.698 0.110 0.063 0.134 0.463 0.548 0.467 0.963 0.981 0.959 0.568 0.665 0.534
13 1E-04 [-1, 1] 48 16 13 27.569 27.456 21.854 0.893 0.911 0.840 13708.906 7154.669 21816.881 733.920 138.891 688.560 0.115 0.073 0.153 0.473 0.505 0.443 0.965 0.970 0.953 0.574 0.647 0.527
14 1E-03 [0, 1] 48 16 23 27.538 29.522 24.562 0.878 0.919 0.843 29585.008 5564.913 15373.778 43.444 12.103 393.712 0.115 0.087 0.134 0.403 0.493 0.431 0.963 0.980 0.966 0.513 0.625 0.503
15 1E-04 [0, 1] 24 16 23 27.484 28.207 25.791 0.888 0.917 0.865 16073.775 6601.157 15575.913 54.065 49.014 99.296 0.083 0.076 0.107 0.402 0.495 0.454 0.966 0.974 0.979 0.537 0.621 0.523
16 1E-03 [0, 1] 48 32 13 27.478 28.951 24.959 0.877 0.920 0.856 70641.461 8010.628 36216.975 43.332 11.990 106.059 0.121 0.091 0.120 0.410 0.507 0.425 0.962 0.978 0.973 0.520 0.635 0.502
17 1E-03 [0, 1] 24 16 13 27.469 29.630 24.792 0.864 0.928 0.839 28926.595 6738.190 20186.931 49.025 9.263 100.597 0.104 0.079 0.123 0.342 0.523 0.365 0.953 0.984 0.968 0.514 0.647 0.487
18 1E-04 [-1, 1] 24 16 13 27.255 25.921 22.320 0.888 0.881 0.839 16668.899 13423.693 32006.659 160.676 138.158 506.382 0.085 0.089 0.147 0.449 0.480 0.428 0.954 0.917 0.956 0.564 0.626 0.503
19 1E-03 [0, 1] 24 32 13 27.136 28.773 25.125 0.868 0.904 0.846 27846.146 8782.024 17722.832 46.879 18.067 103.554 0.110 0.085 0.107 0.390 0.467 0.398 0.953 0.959 0.960 0.519 0.621 0.502
20 1E-03 [0, 1] 48 16 13 26.696 28.556 24.029 0.870 0.919 0.851 77937.167 12939.768 40001.462 75.069 17.459 165.338 0.133 0.088 0.141 0.409 0.509 0.425 0.956 0.975 0.972 0.510 0.634 0.496
21 1E-04 [-1, 1] 24 32 13 26.607 26.326 22.887 0.884 0.907 0.830 32537.658 7748.202 12920.010 555.460 131.769 389.933 0.123 0.088 0.131 0.437 0.506 0.392 0.958 0.973 0.959 0.541 0.623 0.489
22 1E-04 [0, 1] 24 32 23 26.431 29.097 25.492 0.899 0.919 0.866 11530.729 5454.081 14939.411 84.905 16.923 84.789 0.081 0.077 0.097 0.441 0.505 0.452 0.972 0.974 0.979 0.548 0.632 0.519
23 1E-03 [-1, 1] 48 32 23 26.404 26.161 21.822 0.877 0.896 0.832 38252.223 8104.111 35737.951 628.334 275.108 521.082 0.127 0.105 0.158 0.424 0.515 0.408 0.957 0.967 0.950 0.532 0.634 0.493
24 1E-03 [0, 1] 24 32 23 26.378 27.466 23.852 0.865 0.911 0.847 113531.620 8698.852 47569.925 82.820 21.258 148.253 0.140 0.107 0.135 0.386 0.474 0.404 0.953 0.973 0.969 0.504 0.612 0.492
25 1E-03 48 32 13 26.333 25.957 21.064 0.867 0.897 0.822 41068.831 8363.713 27817.301 596.830 155.642 676.272 0.125 0.101 0.167 0.401 0.490 0.411 0.948 0.970 0.943 0.524 0.611 0.490
26 1E-04 24 16 23 26.302 25.248 21.702 0.884 0.907 0.845 22839.337 7214.422 19154.919 536.285 145.996 551.409 0.118 0.095 0.153 0.421 0.505 0.447 0.962 0.972 0.955 0.552 0.637 0.522
27 24 16 23 26.257 25.917 22.442 0.856 0.903 0.822 31874.004 7655.907 20712.652 128.027 117.134 471.192 0.099 0.094 0.141 0.366 0.460 0.398 0.938 0.971 0.939 0.521 0.609 0.492
28 24 32 13 25.760 25.121 22.422 0.827 0.841 0.799 37571.295 15448.575 19122.274 104.433 143.441 520.381 0.100 0.104 0.144 0.293 0.349 0.332 0.933 0.904 0.933 0.488 0.564 0.469
[-1, 1]
29 48 16 23 25.228 25.023 20.174 0.865 0.898 0.818 24081.693 10368.597 36141.650 352.006 132.913 650.823 0.131 0.113 0.188 0.393 0.486 0.411 0.949 0.969 0.942 0.516 0.600 0.478
1E-03
30 48 16 13 25.127 24.288 19.770 0.868 0.894 0.803 42774.609 10018.578 50581.731 724.310 146.779 467.035 0.142 0.125 0.184 0.418 0.481 0.356 0.954 0.967 0.936 0.527 0.595 0.452
31 24 16 13 24.960 24.068 21.016 0.849 0.883 0.811 29097.866 9794.925 20383.777 635.742 292.600 573.417 0.133 0.134 0.155 0.363 0.491 0.390 0.936 0.962 0.927 0.511 0.602 0.489
32 24 32 23 24.250 25.463 22.220 0.861 0.891 0.831 84502.902 10931.658 20401.836 311.825 124.070 458.510 0.139 0.101 0.136 0.355 0.434 0.408 0.950 0.967 0.954 0.502 0.594 0.478
1E-04 - [0,1] 1E-04 - [0,1]
33.5 0.94
1E-04 - [-1,1] 1E-04 - [-1,1]
31.5 0.92
1E-03 - [0,1] 1E-03 - [0,1]
PSNR (dB)
0.9
29.5 1E-03 - [-1,1] 1E-03 - [-1,1]
0.88
SSIM
Normalization
Learning Rate
Normalization
Learning Rate
27.5
0.86
25.5
0.84
23.5 0.82
21.5 0.8
19.5 0.78
132313231323132313231323132313231323132313231323 Stride 1323 13 2313 231323 132313 231323 1323 132313 231323 1323 Stride
16 32 16 32 16 32 16 32 16 32 16 32 Batch Size 16 32 16 32 16 32 16 32 16 32 16 32 Batch Size
Better
Better
24 48 24 48 24 48 Input Size 24 48 24 48 24 48 Input Size
DIV2K SET5 SET14 DIV2K SET5 SET14
(a) (a)
9E+4 1E-04 - [0,1]
Normalization
Learning Rate
8E+4 1E-04 - [-1,1]
7E+4 1E-03 - [0,1]
ERGAS
6E+4 100
1E-03 - [-1,1]
PAMSE
5E+4 Normalization
Learning Rate
4E+4 1E-04 - [0,1]

3E+4 10 1E-04 - [-1,1]
2E+4 1E-03 - [0,1]
1E+4 1E-03 - [-1,1]
0E+0 1
132313231323132313231323132313231323132313231323 Stride 132313231323132313231323132313231323132313231323 Stride
Better
Better

(c) (b)
0.20 0.58
0.18
Normalization
Learning Rate
0.53
Normalization
Learning Rate
0.16
0.48
0.14
SAM
SCC
0.12 0.43
0.10 1E-04 - [0,1]
0.38
0.08 1E-04 - [0,1] 1E-04 - [-1,1]
1E-04 - [-1,1] 0.33
0.06 1E-03 - [0,1]
1E-03 - [0,1]
1E-03 - [-1,1] 1E-03 - [-1,1]
0.04 0.28
132313231323132313231323132313231323132313231323 Stride 132313231323132313231323132313231323132313231323 Stride
Better
Better

(d) (f)
1E-04 - [0,1]
0.99 0.68
1E-04 - [-1,1]
0.98
Normalization
Learning Rate
1E-03 - [0,1]
0.97
0.62 1E-03 - [-1,1]
0.96
VIF
Normalization
Learning Rate
UQI
0.95
0.56
0.94
1E-04 - [0,1]
0.93
1E-04 - [-1,1] 0.5
0.92
1E-03 - [0,1]
0.91
1E-03 - [-1,1]
0.9 0.44
132313231323132313231323132313231323132313231323 Stride 132313231323132313231323132313231323132313231323 Stride
Better
Better

(g) (h)
Fig. 1. Change in the performance of EDSR according to binary values of 5 hyper parameters. The sub figures (a), (b), (c), (d), (e), (f), (g) and (h) show PSNR,
SSIM, ERGAS, PAMSE, SAM, SCC, UQI and VIF scores, respectively. EDSR exhibits superb performance when learning rate is 1E-04, normalization is min-
max [0,1] and input size is 48, whereas it exhibits the worst performance when learning rate is 1E-03 and normalization is min-max [-1,1].
L:1E-04 N:0,1 I:48 B:32 S:13 L:1E-04 N:0,1 I:48 B:16 S:23 L:1E-04 N:0,1 I:48 B:16 S:13 L:1E-04 N:0,1 I:48 B:32 S:23
L:1E-04 N:0,1 I:24 B:16 S:13 L:1E-04 N:0,1 I:24 B:32 S:13 L:1E-04 N:-1, 1 I:48 B:16 S:23 L:1E-03 N:0,1 I:24 B:16 S:23
L:1E-03 N:0,1 I:48 B:32 S:23 L:1E-04 N:-1, 1 I:48 B:32 S:13 L:1E-04 N:-1, 1 I:24 B:32 S:23 L:1E-04 N:-1, 1 I:48 B:32 S:23
L:1E-04 N:-1, 1 I:48 B:16 S:13 L:1E-03 N:0,1 I:48 B:16 S:23 L:1E-04 N:0,1 I:24 B:16 S:23 L:1E-03 N:0,1 I:48 B:32 S:13
Original Image
L:1E-03 N:0,1 I:24 B:16 S:13 L:1E-04 N:-1, 1 I:24 B:16 S:13 L:1E-03 N:0,1 I:24 B:32 S:13 L:1E-03 N:0,1 I:48 B:16 S:13
Ground-Truth L:1E-04 N:-1, 1 I:24 B:32 S:13 L:1E-04 N:0,1 I:24 B:32 S:23 L:1E-03 N:-1, 1 I:48 B:32 S:23 L:1E-03 N:-1, 1 I:48 B:32 S:23
L:1E-03 N:-1, 1 I:48 B:32 S:13 L:1E-04 N:-1, 1 I:24 B:16 S:23 L:1E-03 N:-1, 1 I:24 B:16 S:23 L:1E-03 N:-1, 1 I:24 B:32 S:13
Fig. 2. Qualitative comparison of the performance of 32 EDSR models in the ‘0880.png’ in our DIV2K test set. The letters L, N, I, B, and S are abbreviations of
learning rate, normalization, input size, batch size and stride, respectively. The images are in the same order as in Table II. From left to right first and then from top
to bottom.
L:1E-04 N:0,1 I:24 B:16 S:13 L:1E-04 N:0,1 I:24 B:32 S:13 L:1E-04 N:-1, 1 I:48 B:16 S:23 L:1E-03 N:0,1 I:24 B:16 S:23
L:1E-03 N:0,1 I:48 B:32 S:23 L:1E-04 N:-1, 1 I:48 B:32 S:13 L:1E-04 N:-1, 1 I:24 B:32 S:23 L:1E-04 N:-1, 1 I:48 B:32 S:23
Original Image
L:1E-04 N:-1, 1 I:48 B:16 S:13 L:1E-03 N:0,1 I:48 B:16 S:23 L:1E-04 N:0,1 I:24 B:16 S:23 L:1E-03 N:0,1 I:48 B:32 S:13
L:1E-03 N:0,1 I:24 B:16 S:13 L:1E-04 N:-1, 1 I:24 B:16 S:13 L:1E-03 N:0,1 I:24 B:32 S:13 L:1E-03 N:0,1 I:48 B:16 S:13
Ground-Truth
L:1E-04 N:-1, 1 I:24 B:32 S:13 L:1E-04 N:0,1 I:24 B:32 S:23 L:1E-03 N:-1, 1 I:48 B:32 S:23 L:1E-03 N:-1, 1 I:48 B:32 S:23
Fig. 3. Qualitative comparison of the performance of 32 EDSR models in the ‘z_bird_GT.bmp’ in SET5 test set. The letters L, N, I, B, and S are abbreviations of
to bottom.
L:1E-04 N:0,1 I:24 B:16 S:13 L:1E-04 N:0,1 I:24 B:32 S:13 L:1E-04 N:-1,1 I:48 B:16 S:23 L:1E-03 N:0,1 I:24 B:16 S:23
L:1E-03 N:0,1 I:48 B:32 S:23 L:1E-04 N:-1,1 I:48 B:32 S:13 L:1E-04 N:-1,1 I:24 B:32 S:23 L:1E-04 N:-1,1 I:48 B:32 S:23
Original Image
L:1E-04 N:-1,1 I:48 B:16 S:13 L:1E-03 N:0,1 I:48 B:16 S:23 L:1E-04 N:0,1 I:24 B:16 S:23 L:1E-03 N:0,1 I:48 B:32 S:13
L:1E-03 N:0,1 I:24 B:16 S:13 L:1E-04 N:-1,1 I:24 B:16 S:13 L:1E-03 N:0,1 I:24 B:32 S:13 L:1E-03 N:0,1 I:48 B:16 S:13
L:1E-04 N:-1,1 I:24 B:32 S:13 L:1E-04 N:0,1 I:24 B:32 S:23 L:1E-03 N:-1,1 I:48 B:32 S:23 L:1E-03 N:-1,1 I:48 B:32 S:23
Ground-Truth
L:1E-03 N:-1,1 I:48 B:32 S:13 L:1E-04 N:-1,1 I:24 B:16 S:23 L:1E-03 N:-1,1 I:24 B:16 S:23 L:1E-03 N:-1,1 I:24 B:32 S:13
L:1E-03 N:-1,1 I:48 B:16 S:23 L:1E-03 N:-1,1 I:48 B:16 S:13 L:1E-03 N:-1,1 I:24 B:16 S:13 L:1E-03 N:-1,1 I:24 B:32 S:23
Fig. 4. Qualitative comparison of the performance of 32 EDSR models in the ‘comic.bmp’ in SET14 test set. The letters L, N, I, B, and S are abbreviations of
to bottom.
[8] Y. Tai, J. Yang, and X. Liu, “Image Super-Resolution via Deep
III. CONCLUSION Recursive Residual Network,” in 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), 2017, pp. 2790–2798.
In this study, we experimentally investigated the most [9] H. Temiz and H. S. Bilge, “Super Resolution of B-mode Ultrasound
important hyper parameters enabling a typical deep Images with Deep Learning,” IEEE Access, p. 1, 2020.
convolutional network to best perform in obtaining super [10] X.-J. Mao, C. Shen, and Y.-B. Yang, “Image Restoration Using Very
resolution from single image. For this purpose, the effects of Deep Convolutional Encoder-Decoder Networks with Symmetric Skip
5 commonly used hyper parameters (learning rate, Connections,” Advances in Neural Information Processing Systems,
normalization with different values for each, input (patch) pp. 2802–2810, Mar. 2016.
size, batch size, and stride) on the performance of deep [11] C. Dong, C. C. Loy, K. He, and X. Tang, “Image Super-Resolution
network has been observed experimentally. Totally 32 Using Deep Convolutional Networks,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 38, no. 2, pp. 295–307, 2016.
different training was carried out with EDSR model. The
[12] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep Laplacian
performance of the models benchmarked with eight different Pyramid Networks for Fast and Accurate Super-Resolution,” in IEEE
image quality assessment measures. Their performances also Conference on Computer Vision and Pattern Recognition, 2017, vol. 2
check in terms of human visual system. As a result, very (3), p. 5.
important information will guide other researchers in training [13] C. Dong, C. C. Loy, and X. Tang, “Accelerating the Super-Resolution
of deep networks have been obtained. Experiments show that Convolutional Neural Network,” in Lecture Notes in Computer Science
the learning rate has the greatest effect on the performance of (including subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics), vol. 9906 LNCS, 2016, pp. 391–407.
the network. The second most important parameter is the
[14] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced Deep
normalization method, and the third most important parameter Residual Networks for Single Image Super-Resolution,” in 2017 IEEE
is the size of training image patch. It is seen that when learning Conference on Computer Vision and Pattern Recognition Workshops
rate value is 1E-4, input image is normalized in the range of (CVPRW), 2017, pp. 1132–1140.
[0,1] and the size of the input image is 48x48 pixels, the [15] R. Timofte et al., “NTIRE 2017 Challenge on Single Image Super-
network performs its best. On the other hand, when the Resolution: Methods and Results,” IEEE Computer Society
learning rate value is 1E-3, input image is normalized in the Conference on Computer Vision and Pattern Recognition Workshops,
vol. 2017-July, pp. 1110–1121, 2017.
range of [-1,1] and the input image size is 24x24 pixels, the
[16] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
network performs worst. In addition, the change in the value Quality Assessment: From Error Visibility to Structural Similarity,”
of the batch and stride hyper parameters does not make a IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612,
significant difference in the performance of the network. Apr. 2004.
[17] W. Xue, X. Mou, L. Zhang, and X. Feng, “Perceptual fidelity aware
The study guides researchers in choosing optimal set of mean squared error,” in Proceedings of the IEEE International
hyper parameters and their ideal values in training of Conference on Computer Vision, 2013, pp. 705–712.
convolutional neural networks, in order to find out best [18] R. H. Yuhas, A. F. H. Goetz, and J. W. Boardman, “Discrimination
performing network. Experiments prove that the small among semi-arid landscape endmembers using the spectral angle
learning rate value offers more successful results than the mapper (SAM) algorithm,” 1992.
large learning rate value, normalization in the range of [0,1] [19] Zhou Wang and A. C. Bovik, “A universal image quality index,” IEEE
than normalization in the range of [-1,1], and, the large size Signal Processing Letters, vol. 9, no. 3, pp. 81–84, 2002.
input image patch than the small size input image patch. [20] H. R. Sheikh and A. C. Bovik, “A visual information fidelity approach
Choosing the value set of the hyper parameters in training of to video quality assessment,” in The First International Workshop on
Video Processing and Quality Metrics for Consumer Electronics, 2005,
deep networks in line with this information will provide the pp. 23–25.
optimum network quickly and easily.
REFERENCES
[1] S. Li, R. Fan, G. Lei, G. Yue, and C. Hou, “A two-channel
convolutional neural network for image super-resolution,”
Neurocomputing, vol. 9, no. 275, pp. 267–277, 2018.
[2] J. Yamanaka, S. Kuwashima, and T. Kurita, “Fast and Accurate Image
Super Resolution by Deep CNN with Skip Connection and Network in
Network,” Neural Information Processing, pp. 217–225, Jul. 2017.
[3] G. Huang, Z. Liu, K. Q. Weinberger, and L. Van Der Maaten, “Densely
Connected Convolutional Networks,” in CVPR, 2017.
[4] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual Dense
Network for Image Super-Resolution,” in The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2018.
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, 2016.
[6] J. Kim, J. K. Lee, and K. M. Lee, “Deeply-Recursive Convolutional
Network for Image Super-Resolution,” in 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1637–
1645.
[7] J. Kim, J. K. Lee, and K. M. Lee, “Accurate Image Super-Resolution
Using Very Deep Convolutional Networks,” in The IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1646–
1654.

An Experimental Study On Hyper Parameters For Training Deep Convolutional Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Experimental Study On Hyper Parameters For Training Deep Convolutional Networks

Uploaded by

Copyright:

Available Formats

Konferans Linki: https://www.ismsitconf.org/?

Konferans Kitapçık İndirme Linki: https://ieeexplore.ieee.org/xpl/conhome/9254214/proceeding?isnumber=9254218&sortType=vol-only-seq

An Experimental Study on Hyper Parameters for

978-1-7281-9090-7/20/$31.00 ©2020 IEEE

B. Results TABLE II. THE CONTRIBUTION OF HYPER PARAMETER VALUES TO THE

PSNR SSIM ERGAS PAMSE SAM SCC UQI VIF

4E+4 1E-04 - [0,1]

24 48 24 48 24 48 Input Size 24 48 24 48 24 48 Input Size

24 48 24 48 24 48 Input Size 24 48 24 48 24 48 Input Size

24 48 24 48 24 48 Input Size 24 48 24 48 24 48 Input Size

You might also like