Professional Documents
Culture Documents
Fpga Implementation of Approximate Softmax Function For Efficient CNN Inference
Fpga Implementation of Approximate Softmax Function For Efficient CNN Inference
Where ‘x’ is a ‘j’-dimensional input vector and ‘ ( )’ is the perceived output probability of its ‘ith’ element.
Thus, for a classification operation with ‘j’ number of classes, the softmax function can be used to calculate the
relative confidence score of the classifier for each class. This can be understood from the standpoint of a neural
network [1] with certain hidden processing layers and an output classification layers such as that depicted in
Fig. 1.
Figure 3: Confusion matrix for the CIFAR-10 dataset with full precision SoftMax function
Figure 3 shows the confusion matrix obtained when the custom CNN show in Figure 2 is used to process CIFAR-
10 dataset with full precision SoftMax implementation. The overall accuracy is 9.54% on the validation set and
2.62% on the training set.
Figure 4: Histogram of input values to the SoftMax function for CIFAR-10 dataset
Figure 4 illustrates the data statistics collected for CIFAR-10 dataset when processed using the custom CNN
architecture shown in Figure 2. From the statistics it can be clearly seen that the bulk of the input values are
within a narrow range around 0 i.e. [-10 10]. Thus, it does not seem appropriate to design a hardware circuit for
a generic wider range of inputs because that leads to a very complex hardware circuit. Since the exponent
function is the main complicated operation in eq. (1), we keep our discussion focused on this operation only. It
can be clearly seen in Figure 5, that if the whole input range of [-20 40] is considered, the full precision
exponent function has a very wide output range i.e. [0 2.5 × 1017]. This requires enormous logic resources to
implement the functionality in hardware. Moreover, the input operands are real numbers with fractional parts
as well. This necessitates the use of floating- or fixed-point number formats which adds to the circuit
Figure 5: Exponent function plot for the range of input values to SoftMax function
As mentioned earlier, the main arithmetic units in the SoftMax function are the exponential and division
functions. The hardware design entry of complex functions such as that of Softmax and its integration within
the larger deep neural network is, however, too complex to be handled using conventional hardware
description language (HDL) approach. Thus, we have employed the high-level synthesis tool supplied by
Mathworks in Simulink i.e. HDL Coder toolbox. The HDL coder generates the HDL code (Verilog or VHDL)
automatically which can then be incorporate as a hardware accelerator in a larger system employing both
software (CPU) and hardware accelerator combined called “Hardware-Software Co-Design”. One big advantage
of using Simulink for such designs is that the environment can be easily setup for simulation using real images
(datasets) and the functionality can be tested before finally incorporating into the hardware. The whole
hardware-software co-design system has been implemented on an FPGA SOC i.e. Zedboard which contains both
processor and programmable sections. The main application for deep neural network runs on the processor
while the Softmax accelerator has been implemented on the FPGA logic fabric accessible to the software
through standard bus interface i.e. AXI interconnect.
Figure 7: The Hardware-Software Co-Design implemented on Zedboard FPGA for deep neural network
processing on live video stream
IV. RESULTS AND DISCUSSION
As mentioned in the previous section, various approximations can be applied to the SoftMax operation to lower
its computational demand while keeping the result accuracy higher. In this section, the effect of these different
approximation methods has been analyzed when applied to real-world scenario.
Approximating SoftMax Function through Integer-Only Operations
The first approximation technique applied to the whole CNN inference framework on standard CIFAR-10
dataset is the conversion of input operands to the integer-only format by dropping the fractional part without
rounding as given by,
⌊ ⌋ (2)
Although rounding seems a better method than truncation, the overall accuracy of the CNN classifier did not
register any significant drop and very similar 9.56% and 2.89% detection rates were observed on the validation
set and the training set respectively. There were, however, negligible errors in the calculation of confidence
scores in the validation set as depicted in the histogram of errors shown in Figure 8. It can be noticed that
almost all the confidence scores had zero error with only a tiny percentage (0.21 %) showing errors as little as
0.01. Thus, it can be safely concluded that integer-only operation does not affect the result accuracy
significantly while reducing the computational load from floating point operation to integer operation.
Figure 8: Histogram of errors in confidence score of the validation set introduced due
to integer-only operation
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[877]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
Approximating SoftMax Function through Limiting the Operand Range
The second approximation considered in this work is limiting the range of input operands. The first step in this
regard is limiting the operands to only positive integers i.e. [0: ∞]. Later, the range is systematically reduced to
[0: 31], [0: 15] and [0: 7] to correspond to binary bit representation of 5-bits, 4-bits and 3-bits respectively.
These successive approximations on top of using integer-only operands lead to increasingly larger errors in
both classification accuracy and confidence scores. The errors have been reported in Figures 9 to 12 and Table
1. It can be observed that the result fidelity in both the classification accuracy and the confidence scores has
been largely preserved for integer range up to [0: 15] and takes a significant hit below that. Thus, if the range is
limited to integers between a very narrow range i.e. [0: 7], the classification error increases to 22.5% while the
confidence scores can have an error up to 3.9%. The corresponding histograms of error also show the same
trend. In Figure 13, it can be seen that drastically reducing the input operand range to [0: 7], leads to higher
occurrences of non-zero errors. From this data, it can be concluded that the integer-only range [0: 15] can be
used safely with an acceptable range of error introduced due to the approximation in input operand
representation. This leads to a significant savings in the computation resources since only 4 bits are required
for number representation compared to the original floating point representation requiring at least 32 bits for
single precision representation.
Figure 9: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: ∞]
Figure 10: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 31]
Figure 11: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 15]
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[878]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
Figure 12: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 7]
Approximating SoftMax Function through series implementation
One of the most common techniques used in the literature to approximate exponent function is through
truncation of its series expansion. Precisely, the McLaurin series expansion of the exponent function is given as
follows,
∑ (3)
(4)
(5)
(6)
(7)
The infinite series for exponent function given by Eq. 3 can be approximated by first 2, 3, 4 or 5 terms as in
equations 4, 5, 6 and 7 respectively. The results of using these approximations along with the earlier
approximations i.e. integer-only operation with range limited to [0:15] have been reported in Figures 13 to 16
and Table 1. It can be noticed that although using only a two-term approximation does not lead to any
degradation of classification accuracy, a 10% error has been introduced in the confidence level scores. Using a
three-term approximation leads to a 5% error while using four terms gives 2.85 % error. 1.8 % error is given
when using five terms. With each additional term, the complexity of the operation grows. We, however,
conclude that using three or four terms is sufficient since the error is within acceptable range. Using five terms
gives a very low error but the additional complexity over four terms is not justified. To further reduce the
hardware complexity associated with division operation, it is suggested to use the nearest power-of-two
coefficients in eq. (6) to give,
(8)
As seen from the data in Table 1, this approximation does not affect the detection accuracy while only
marginally affecting the confidence scores.
Figure 13: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 15] with 2 term approximation of exponent function
Figure 14: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 15] with 3 term approximation of exponent function
Figure 15: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 15] with 4 term approximation of exponent function
Figure 16: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 15] with 5 term approximation of exponent function
Table 1. Comparison of different approximation techniques using error on CIFAR-10 Dataset
SN. Method Validation Classification Error Validation Confidence Score Error
1 Full Precision 9.54 % 5 × 10-7 %
2 Integer-Only 9.56 % 0.21 %
3 Integer-Only [0: ∞] 9.56 % 0.24 %
4 Integer-Only [0: 31] 9.56 % 0.24 %
5 Integer-Only [0: 15] 9.73 % 0.31 %
6 Integer-Only [0: 7] 22.5 % 3.9 %
7 Integer-Only [0: 15], 2 series
9.73 % 10.33 %
terms
8 Integer-Only [0: 15], 3 series
9.73 % 5.1%
terms
Figure 17: High-Level circuit diagram for the approximate exponent function using Simulink HDL Coder
V. CONCLUSION
An approximate circuit for implementation of SoftMax function as used in standard CNN architectures has been
described in this work. For this purpose, various approximation techniques related to the range and type of
operands and series expansion of exponent function have been employed. The considered techniques have
been motivated by the actual signal statistics gathered while processing a real world standard dataset i.e.
CIFAR-10. To test the setup, a custom CNN has been trained and tested with the proposed approximations
implemented in the full system. The results show that the proposed approximation lead to negligible loss in
CNN’s detection accuracy as well as the confidence scores while reducing the circuit complexity significantly.
VI. REFERENCES
[1] Online link: Super Data Science, “Convolutional Neural Networks (CNN): Softmax & Cross-Entropy”
available at https://www.superdatascience.com/blogs/convolutional-neural-networks-cnn-softmax-
crossentropy, accessed 28th Nov, 2020.
[2] Z. Li, H. Li, X. Jiang, B. Chen, Y. Zhang and G. Du, "Efficient FPGA Implementation of Softmax Function for
DNN Applications," 2018 12th IEEE International Conference on Anti-counterfeiting, Security, and
Identification (ASID), Xiamen, China, 2018, pp. 212-216
[3] Kouretas, I.; Paliouras, V. Hardware Implementation of a Softmax-Like Function for Deep Learning.
Technologies 2020, 8, 46
[4] Yuan, B. “Efficient hardware architecture of softmax layer in deep neural network.” 2016 29th IEEE
International System-on-Chip Conference (SOCC) (2016): 323-326.
[5] Gaoming Du, Chao Tian, Zhenmin Li, Duoli Zhang, Yongsheng Yin, and Yiming Ouyang, “Efficient
Softmax Hardware Architecture for Deep Neural Networks”, in Proceedings of the 2019 Great Lakes
Symposium on VLSI (GLSVLSI '19).
[6] R. Rekha and K. P. Menon, "FPGA implementation of exponential function using cordic IP core for
extended input range," 2018 3rd IEEE International Conference on Recent Trends in Electronics,
Information & Communication Technology (RTEICT), Bangalore, India, 2018, pp. 597-600