Professional Documents
Culture Documents
Design of Power Efficient Posit Multiplier
Design of Power Efficient Posit Multiplier
Design of Power Efficient Posit Multiplier
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 07,2021 at 05:42:30 UTC from IEEE Xplore. Restrictions apply.
862 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 67, NO. 5, MAY 2020
Fig. 3. Posit component extraction in hardware arithmetic unit. Fig. 4. Datapath of the proposed posit multiplier.
II. P OSIT N UMBER S YSTEM rounding. In addition, the sign and exponent are also processed
The general format of a posit number is shown in Fig. 1. A separately. The mantissa multiplier is a (nb − es)-bit radix-
posit number Posit(nb, es) is defined with the total bit-width nb 4 modified Booth multiplier [10]. The bit-widths of other
and the exponent bit-width es. It has four components: sign (s), components are also shown in Fig. 4.
regime (rg), exponent (exp), and mantissa (frac). The compo- As discussed in Section I, in posit format, the bit-width
nent bit-width is not constant. The regime bit-width varies for of the mantissa varies for different values. Therefore, the
different values. The exponent and the mantissa will occupy mantissa does not always require a (nb − es)-bit multiplier.
the remaining bit positions and they will not be included in the Although the unused bits of the (nb−es)-bit mantissa are filled
format when the regime occupies all bit positions. The value with zeros, those zero bits will be inverted to ones when the
of a number represented in posit format is: partial product of Booth multiplier [10] is negative. Therefore,
the circuits for these bit positions, including the partial prod-
value = (−1)s × useedrg × 2exp × (1 + frac) (1) uct accumulation and the final adder, are still toggling. This
2es will lead to a waste of power and energy.
where useed = 2 .
In the proposed design, two changes are made to avoid the
In hardware arithmetic unit design, the extraction of compo-
unnecessary signal toggling in order to reduce the power con-
nents is not as straightforward as the floating-point format. The
sumption. The first change is the generation of the control
circuit shown in Fig. 3 (except the grey module) is commonly
signal for the mantissa multiplier so that only the neces-
used to extract each component of a posit number [8], [9].
sary part of the multiplier is enabled. The second change is
The number is complemented first if it is negative. Then the
the decomposition of the mantissa multiplier and each of the
regime part is first extracted. The regime part is a series of
small portion is controlled by the control signal. The design
ones (zeros) followed by a single zero (one) bit. Therefore, a
details of these two changes are discussed below in detail. A
leading zero detector (LZD) and a leading one detector (LOD)
multiplier for Posit(16, 1) is used as an example to discuss the
are used to count the number of leading bits. If leading ones
design details.
are detected, rg equals to count − 1. Otherwise, rg is −count
and a complementer, COMP, is needed to convert the positive
count to a negative rg value. In addition, the regime bit-width A. The Decomposition of the Multiplier
shift_rg is also generated which is count +1 so that the regime
For Posit(16, 1) multiplier, a 15-bit mantissa multiplier is
can be removed by shifting operation and the exponent and
used. When using radix-4 Booth multiplication algorithm [10],
mantissa can be obtained. The regime bit-width ranges from
the partial product array is shown in Fig. 5(a). There is a total
2-bit to (nb − 1)-bit. In order to accommodate all cases, the
of 8 partial products and each partial product is 16-bit.
final extracted mantissa is (nb − es)-bit. In a posit multiplier
In the proposed design, the 15-bit multiplier is divided into 4
or multiply-accumulate unit design, a (nb − es)-bit mantissa
groups: the most significant 3-bit is one group, and the remain-
multiplier will be used.
ing 12-bit are divided into three 4-bit groups. Correspondingly,
the 8 partial products are divided into 4 groups, RH_1, RH_2,
III. T HE P ROPOSED P OSIT M ULTIPLIER RH_3, and RH_4, as shown in Fig. 5. If the multiplier is less
The parameterized datapath of the proposed posit multiplier than 3-bit, then only the two partial products in RH_4 are
is shown in Fig. 4. The critical path contains posit component generated while all others are set to zeros. If the multiplier
extraction (which is detailed in Fig. 3), mantissa multiplier, is more than 3-bit but less than 7-bit, then partial products
final adder and normalization, posit component packing, and in RH_3 and RH_4 are generated. If the multiplier is more
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 07,2021 at 05:42:30 UTC from IEEE Xplore. Restrictions apply.
ZHANG AND KO: DESIGN OF POWER EFFICIENT POSIT MULTIPLIER 863
TABLE I
T RUTH TABLE TO G ENERATE C ONTROL S IGNAL
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 07,2021 at 05:42:30 UTC from IEEE Xplore. Restrictions apply.
864 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 67, NO. 5, MAY 2020
TABLE II
C OMPARISON OF THE P ROPOSED P OSIT M ULTIPLIER
W ITH S TANDARD P OSIT M ULTIPLIER
Fig. 6. The use of ctl signal as the enable signal for partial product generation
(ppg) in mantissa multiplier (Posit(16,1) example). ppg(x,y) refers to the ppg
used to generate partial products in region RV_x and RH_y.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 07,2021 at 05:42:30 UTC from IEEE Xplore. Restrictions apply.
ZHANG AND KO: DESIGN OF POWER EFFICIENT POSIT MULTIPLIER 865
TABLE III
A REA AND P OWER C OMPARISON OF E ACH M ODULE FOR N ORMAL gets larger, the proposed architecture can still achieve power
AND P ROPOSED P OSIT (16,1) M ULTIPLIER reduction but the gap between normal design and the proposed
design is reduced. This is mainly due to the reduction in man-
tissa bit-width. However, due to the existence of the regime
bits, posit formats with small exponent bit-width can provide
enough dynamic range for many applications. The evaluated
formats shown in Table II are widely used in many applica-
tions [3], [4], [5]. The proposed power reduction method can
be effectively applied in those applications to achieve power
efficient posit computation.
V. C ONCLUSION
In this brief, a power efficient posit multiplier architecture
is proposed. Motivated by the fact that the whole mantissa
multiplier in a posit multiplier is not always fully required,
the proposed design divides the mantissa multiplier into small
portions. At run-time, only the required portions are enabled
to avoid unnecessary signal toggling to reduce the power con-
sumption. Whether to enable a multiplier portion is controlled
by the regime bit-width generated during component extrac-
tion. The proposed method is evaluated with 8-bit, 16-bit, and
32-bit posit multiplier and an average of 16% power reduction
can be achieved. The proposed method is suitable to be used
in any low power posit arithmetic unit designs.
In the future, more power reduction opportunity in the posit
multiplier architecture will be explored. In addition, the inves-
tigation of power efficient posit arithmetic unit design will be
extended to posit adder and posit multiply-accumulate unit.
R EFERENCES
[1] J. L. Gustafson and I. Yonemoto, “Beating floating point at its own
Fig. 7. Comparison of power consumption under various exponent width. game: Posit arithmetic,” Supercomput. Front. Innovat. Int. J., vol. 4,
no. 2, pp. 71–86, Jun. 2017.
[2] IEEE Standard for Floating-Point Arithmetic, IEEE Standard 754-2008,
Aug. 23, 2008, pp. 1–70.
composed of component packing and rounding modules. As [3] Z. Carmichael, S. H. F. Langroudi, C. Khazanov, J. Lillie,
shown in Table III, the proposed posit multiplier has 4% larger J. L. Gustafson, and D. Kudithipudi, “Deep positron: A deep neural
network using the posit number system,” CoRR, vol. abs/1812.01762,
area compared to normal posit multiplier. This area overhead pp. 1–6, Dec. 2018.
comes from the extra control used in the mantissa multiplier. [4] J. Johnson, “Rethinking floating point for deep learning,” CoRR,
Although for input process, an extra control signal is required vol. abs/1811.01721, pp. 1–8, Nov. 2018.
[5] M. Klöwer, P. D. Düben, and T. N. Palmer, “Posits as an alternative
to be generated, however, as shown in equation (3), the logic to floats for weather and climate models,” in Proc. Conf. Next Gener.
to generate control signal is simple, and the area overhead is Arithmetic, Mar. 2019, pp. 1–8.
negligible. [6] R. Chaurasiya et al., “Parameterized posit arithmetic hardware genera-
tor,” in Proc. IEEE 36th Int. Conf. Comput. Design (ICCD), Orlando,
The power reduction mainly comes from the mantissa FL, USA, Oct. 2018, pp. 334–341.
multiplier module and the final addition module as shown in [7] M. K. Jaiswal and H.-K. So, “Architecture generator for type-3 unum
Table III. Due to the control signal, only the required portion posit adder/subtractor,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS),
Florence, Italy, May 2018, pp. 1–5.
of the mantissa multiplier is enabled. This leads to a 28%
[8] H. Zhang, J. He, and S.-B. Ko, “Efficient posit multiply-accumulate
power reduction in the proposed Posit(16, 1) design. Because unit generator for deep learning applications,” in Proc. IEEE Int. Symp.
the unused portion of the mantissa multiplier is disabled, the Circuits Syst. (ISCAS), Sapporo, Japan, May 2019, pp. 1–5.
least significant part of the product will become zeros. The [9] A. Podobas and S. Matsuoka, “Hardware implementation of POSITs and
their application in FPGAs,” in Proc. IEEE Int. Parallel Distrib. Process.
signal toggle of the least significant part of the final adder can Symp. Workshops (IPDPSW), Vancouver, BC, Canada, May 2018,
also be avoided. This leads to on average 30% power reduc- pp. 138–145.
tion in final adder. However, as shown in Fig. 2, the adder [10] A. D. Booth, “A signed binary multiplication technique,” Quart. J. Mech.
Appl. Math., vol. 4, no. 2, pp. 236–240, 1951.
part does not contribute much to the total power consump- [11] SoftPosit-Python. Accessed: Oct. 2018. [Online]. Available:
tion. Therefore, the power reduction of the whole design is https://posithub.org/docs/PositTutorial_Part1.html
still 22%. [12] Z. Carmichael, H. F. Langroudi, C. Khazanov, J. Lillie, J. L. Gustafson,
and D. Kudithipudi, “Performance-efficiency trade-off of low-precision
The power consumption of larger exponent bit-width numerical formats in deep neural networks,” in Proc. Conf. Next Gener.
designs are presented in Fig. 7. When the exponent bit-width Arithmetic, Mar. 2019, pp. 1–9.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 07,2021 at 05:42:30 UTC from IEEE Xplore. Restrictions apply.