Professional Documents
Culture Documents
Hub Explanation
Hub Explanation
Abstract— This paper analyzes the benefits of using half-unit- one, the addition required for rounding up is postponed until
biased (HUB) formats to implement floating-point (FP) arithmetic the next operation. In [5], a dedicated circuit to compute the
under a round-to-nearest mode from a quantitative point of sticky bit in parallel with the main path was proposed with
view. Using the HUB formats to represent numbers allows the
removal of the rounding logic of arithmetic units, including the aim of accelerating the implementation of multiplication.
sticky-bit computation. This is shown for FP adders, multipliers, A compound adder (a circuit that, having a carry-save input,
and converters. Experimental analysis demonstrates that the delivers the results and the result plus one) was proposed in [6]
HUB formats and the corresponding arithmetic units maintain to generate the rounded result of any operation. In [7], three
the same accuracy as the conventional ones. On the other hand, different methods were compared for multipliers that simplify
the implementation of these units, based on basic architectures,
shows that the HUB formats simultaneously improve area, speed, rounding decisions and merge the rounding up with the
and power consumption. In addition, based on the data obtained computation of the operation. Similarly, Burgess [8] proposed
from the synthesis, an HUB single-precision adder is ∼14% combining rounding with the final addition to convert the
faster but consumes 38% less area and 26% less power than the carry-save solution into the conventional representation. In [9],
conventional adder. Similarly, an HUB single-precision multiplier a rounding scheme was presented for high-speed multipliers
is 17% faster, uses 22% less area, and consumes slightly less
power than the conventional multiplier. At the same speed, the based on a rounding table and prediction.
adder and the multiplier achieve area and power reductions of A totally different approach would be to use a new
up to 50% and 40%, respectively. real-number encoding, in order to simplify the implementation
Index Terms— Adders, digital arithmetic, floating-point (FP) of RN. Thus, the problem would change from optimizing the
arithmetic, multiplication, optimization, power consumption. rounding operation to dealing with arithmetic operations under
the new number representation. This proposal is found in [10]
I. I NTRODUCTION with RN representations and [11] with half-unit-biased (HUB)
formats. Together with other advantages, these new formats
T HE rounding operation is performed in almost all
arithmetic operations involving real numbers. There are
several ways to perform this operation, although unbiased
allow performing RN simply by truncation. On the other hand,
these new formats are based on the simple modifications of
rounding-to-nearest (RN) has the best characteristics [1], [2]. the conventional formats and so could be applied to practically
It provides the closest possible number to the original exact any conventional format. In this paper, we focus on the HUB
value, but if the exact value is exactly halfway between two FP formats.
The efficiency of using the HUB formats for a fixed-
numbers, then it is selected randomly. The most commonly
point representation has been demonstrated in [12] and [13].
used approach is the tie-to-even method, which is the default
By reducing the bitwidth while maintaining the same accuracy,
mode of the floating-point (FP) IEEE-754 standard (see [3]).
the area cost and delay of finite impulse response filter
However, the implementation of this rounding mode is
implementations have been dramatically reduced in [12], and
relatively complex, and the area and the delay introduced
similarly for the QR decomposition in [13].
for rounding circuits may be very large, since they normally
In this paper, we perform a quantitative estimation of the
lead in the critical path. For this reason, it is generally
benefit obtained using the HUB formats to implement the
used in the FP circuits. Many researchers have proposed
FP computation systems under RN. Some preliminary results
different architectures to reduce the impact of this delay by
for half-precision FP adders and multipliers were presented
merging rounding with other operations or removing it from
in [14]. The work in [14] shows that the area and power
the critical path. For instance, an FP adder was proposed in [4],
consumption of a basic FP adder could be improved by
wherein if the result of an addition is the input to another
up to 70% for high frequencies when using HUB formats,
Manuscript received July 17, 2015; revised October 20, 2015; accepted whereas they remain the same for the basic FP multiplier.
November 17, 2015. Date of publication December 8, 2015; date of current In addition to a deeper analysis, in this paper, we extend
version May 20, 2016. This work was supported by the Ministry of Education
and Science of Spain under Contract TIN2013-42253-P.
these results to other sizes and circuits, such as converters.
The authors are with the Department of Computer Architecture, Universidad In comparison with the work in [14], the main contributions
de Málaga, Málaga E-29071, Spain (e-mail: fjhormigo@uma.es). of this paper are as follows:
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. 1) a detailed architecture for basic adders and multipliers
Digital Object Identifier 10.1109/TVLSI.2015.2502318 to deal with HUB numbers;
1063-8210 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2370 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016
Fig. 2. Basic FP adder architectures for (a) standard and (b) HUB numbers.
to the left if there are leading zeros whose number is computed the fixed-point adder. Despite this, the fixed-point adder is
in the leading one detector. Besides normalization, the result slightly shorter than the one shown in Fig. 2(a). The latter
has to be rounded based on the two LSBs of the result and the requires two guard bits and the carry input for the sticky bit
sticky bit. However, no roundup is required when the result (i.e., m + 2 bits) due to the rounding operation. However, in the
has at least two leading zeros [1], [23]. Thus, left shifting and HUB approach shown in Fig. 2(b), no guard bits are required
rounding are performed in parallel paths. On the other hand, because rounding is performed by truncation. Thus, the fixed-
the new exponent is also generated in a parallel path. If the point adder has only m + 1 bits (one additional bit to support
result of addition is rounded up, an overflow may be produced, the ILSB). In fact, the ILSB is shown in the architecture to
which requires a new correction of the exponent. simplify the explanation, although, as presented in [11], this
The same basic architecture could be used for the fixed-point addition can be implemented using an m-bit adder
HUB numbers, although the significands are 1 bit larger and and an inverter.
the rounding circuit is removed. However, knowing that the Finally, the rounding path [gray in Fig. 2(a)] is removed,
ILSB always equals 1, the significand datapath is further because rounding is simply performed by truncation.
optimized, as shown in Fig. 2(b). The first difference is in Consequently, given that explicit rounding up is not performed
the right shifter used to align the significands, because the for the HUB architecture, overflow could not occur after
ILSB has to be included at the input to obtain a correct rounding. Thus, the additional correction of the exponent
result if no shifting is performed. Furthermore, the sticky-bit required in Fig. 2(a) is eliminated, which also simplifies the
computation logic has been removed. Given that the ILSBs exponent datapath.
of both the significands equals 1, the sticky bit is always one
for nonaligned significands [11]. Moreover, the sticky bit is B. FP Multiplication for HUB Numbers
not required for aligned significands because shifting is not A classic FP multiplication is simpler than FP addition [1].
performed. The new exponent is computed by adding the exponents of
In this HUB architecture, we should note that the condi- the input operands, whereas the significands are multiplied,
tional bit inverters directly perform the two’s complement, thus obtaining a value that is double the size. The result of
as explained in Section II, and no carry input is required in multiplication is normalized by shifting it 1 bit to the right,
the fixed-point adder. The conditional inverter at the output of if it is required. Finally, the resulting number is rounded to fit
the right shifter has to be modified to control when shifting is the size of the significand. The rounding requires computing
not performed. In this case, the ILSB is explicitly represented the sticky of the m − 2 LSBs of the result of the fixed-point
by the LSB of the output. It then has to be set to 1 after multiplication, and the addition of one ULP for rounding it up.
the inversion to complete the two’s complement operation Fig. 3(a) shows the significand datapath of a straightforward
(see Section II). implementation of the conventional FP multiplication.
On the other hand, Fig. 2(b) shows that the ILSB of the When the HUB numbers are used, the significands again
second operand is appended at the corresponding input of have to be appended with the hidden ILSB that equals one,
2372 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016
Fig. 3. Basic FP multiplier for (a) standard and (b) HUB numbers.
Fig. 4. Basic converter from an FP format to a narrower one for (a) standard
as shown in Fig. 3(b). Thus, the fixed-point multiplier is 1 bit and (b) HUB numbers.
larger, which increases the size of the fixed-point multiplier.
In contrast, the computation of the sticky bit is again pre-
vented [11]. To simplify the explanation, let us assume that exponent by 1. However, under the HUB formats, all these
the significand is scaled such that the ILSB is the only operations are avoided, and a simple truncation of the signifi-
fractional bit. Let us call the integer part of both the input cand will produce the correctly rounded conversion. We should
significands A and B. The product of both the significands is note that the ILSB of the original number guarantees that the
then sticky bit is always 1, which means that the tie case does not
1 exist.
(A + 1/2) · (B + 1/2) = A · B + (A + B) + 1/4. (1) Apart from the dramatic simplification of the conversion
2
Given the last addend of this result, we conclude that the LSB operation, the use of the HUB formats prevents the double
of the result of multiplication is always 1. Thus, the sticky bit rounding error [11]. This well-known problem may occur
is always 1 and no logic is required to compute it. Moreover, when a number is rounded twice in a row (e.g., when the
given that the rounding is simply performed by truncation, the result of an arithmetic operation is first rounded to double pre-
cision and then to single precision). Using standard numbers,
final incrementer is not required [see Fig. 3(b)]. Therefore, as
shown in Section IV-B, although the area of the fixed-point this operation may lead to the final value having an error
multiplier increases, the overall area decreases. greater than 0.5 ULPs. However, this cannot occur when using
HUB numbers [11].
On the other hand, conversion the other way around requires
C. Conversion Between FP Numbers of expanding the significand to fit it into the new size. This is a
Different Sizes for HUB Numbers trivial operation for a standard format, because as many zeroes
Conversion between FP formats with different sizes as needed are appended to the least significant part of the
is another operation that could benefit from using the significand. In relation to the HUB formats, a similar solution
HUB formats. We will focus on the conversion of the signif- could be applied, although the MSB of the bits appended
icand, since the conversion of exponent is not affected when should be 1 (the ILSB) instead of 0. This approach is shown
using the HUB formats (because the HUB FP formats use a in Fig. 5(a).
conventional number to represent the exponent). Due to the new ILSB of the generated HUB number, the
The conversion from an FP format to another format with circuit in Fig. 5(a) produces a rounded tie-to-away significand
a narrower significand requires a rounding operation. For because the significand is always rounded up. The error
instance, the 53-bit significand of a double-precision number introduced due to this rounding is negligible, because the
has to be rounded to 24 bits to turn it into a single-precision error associated with the original FP number is usually several
number. Fig. 4 shows an example of this situation. The orders of magnitude greater. For example, the representation
standard RN tie-to-even rounding mode would require com- of the value 0.1 under the IEEE-754 single-precision standard
puting the sticky bit, determining the need for rounding up, produces an error that equals −1.4901161 × 10−9 assuming
and incrementing the truncated significand value if required. the default rounding mode. When it is extended, the same
These operations involve a considerable amount of hardware. amount of error is transferred to the double-precision number.
Moreover, the rounding-up operation may produce a Similarly, the error under the single-precision HUB format
significand overflow, which would require incrementing the is 2.2351743 × 10−9 which, in this case, is reduced
HORMIGO AND VILLALBA: MEASURING IMPROVEMENT WHEN USING HUB FORMATS TO IMPLEMENT FP SYSTEMS UNDER RN 2373
TABLE I
E XAMPLES OF C ONVERSIONS B ETWEEN THE S IGNIFICAND
OF HUB AND C ONVENTIONAL N UMBERS
Fig. 6. Histogram of the rounding error for the addition experiment. Fig. 7. Histogram of the rounding error for the multiplication experiment.
(a) HUB format. (b) IEEE standard. (a) HUB format. (b) IEEE standard.
TABLE II
S TATISTICAL PARAMETERS OF ROUNDING E RROR D ISTRIBUTION
of precision. In a first experiment, we tested the arithmetic the main statistical parameters corresponding to these errors.
operations. In these experiments, we utilized 32-bit two’s It can be seen that these parameters are very similar when
complement fixed-point numbers within the range (−1 1) both the representations are compared.
(i.e., 1 sign bit and 31 fractional bits) as exact real input A similar experiment was run for multiplication. Again,
numbers. They were converted into an internal 32-bit FP for- two-million exact multiplication results were compared with
mat with only 24 bits for the significand. Thus, a rounding was the results obtained with the FP internal representation when
required for the said conversion. Two different internal formats using the IEEE standard single precision and its equivalent
were tested: 1) the standard IEEE-754 single precision and HUB format. The input operands were also 32-bit fixed-
2) its corresponding HUB format. point numbers, although, in this case, only positive numbers
To test the addition operation, two-million exact results were used, whereas 64-bit fixed-point arithmetic was used to
corresponding to the addition of two input real numbers were calculate the exact multiplication result. Similar to the addition
calculated using 32-bit fixed-point arithmetic (excluding the experiment, although the values of the rounding error for both
results that produced overflow). Moreover, the FP results the representations are different, their statistical parameters are
of adding the same pairs of numbers that were previously very similar, as shown in Table II and Fig. 7, by the histogram
converted into single-precision FP format were computed for of the rounding error for both the HUB format and the
both the IEEE standard and the HUB format. For the former, IEEE standard.
a standard CPU was used to perform the computation, whereas In another experiment, we measured the precision of
for the latter, a field-programmable gate array implementation the conversion from double to single precision. One-million
of the HUB adder presented in Section III-A was used. These double-precision real numbers were randomly generated and
results were converted back into fixed-point representation converted into single-precision format. They were then con-
(considered exact in our experiment) and compared with the verted back into double precision and compared with the
exact results obtained using the fixed-point addition. original numbers. As expected, the statistical distributions of
The calculated error comprises the error in the operation the error for both the standard and HUB formats were once
itself and in conversions. We should note that the values of again very similar. Fig. 8 shows the histograms of the rounding
the rounding error corresponding to both the approaches are error for both the approaches, and their statistical parameters
always different. This happens because the value used for are shown in Table II.
representing the exact real number is different on each inter- The theoretical and experimental results indicate that the
nal representation (see Section II). However, the probability HUB formats could substitute for conventional FP representa-
distributions of these errors are quite similar, as shown by the tion to deal with real-number computation. Clearly, the use of
histogram of the rounding error for both the HUB format and the HUB format to solve real-number calculation would not
the IEEE standard shown in Fig. 6. Moreover, Table II shows provide exactly the same results as those obtained by the use
HORMIGO AND VILLALBA: MEASURING IMPROVEMENT WHEN USING HUB FORMATS TO IMPLEMENT FP SYSTEMS UNDER RN 2375
In any case, as stated, the area or power consumption of these [11] J. Hormigo and J. Villalba, “New formats for computing with real-
circuits is much lower than arithmetic circuits, and thus their numbers under round-to-nearest,” IEEE Trans. Comput., to be published,
doi: 10.1109/TC.2015.2479623.
impact is very low. [12] J. Hormigo and J. Villalba, “Optimizing DSP circuits by a new family
In summary, the use of the HUB FP numbers could of arithmetic operators,” in Proc. Asilomar Conf. Signals, Syst. Comput.,
significantly improve the implementation of arithmetic units. Nov. 2014, pp. 871–875.
[13] S. D. Muñoz and J. Hormigo, “Improving fixed-point implementation
In all the tested cases, except for the converter from single to of QR decomposition by rounding-to-nearest,” in Proc. 19th IEEE Int.
double precision, HUB units clearly outperform standard units, Symp. Consum. Electron. (ISCE), Jun. 2015, pp. 1–2.
in area, speed, and power consumption. Due to the nature [14] J. Hormigo and J. Villalba, “Simplified floating-point units for high
dynamic range image and video systems,” in Proc. 19th IEEE Int. Symp.
of the improvement, which is based on the simplification Consum. Electron. (ISCE), Jun. 2015, pp. 1–2.
of the FP algorithms themselves, it is expected that the [15] S. F. Oberman, “Design issues in high performance floating point
adaptation of more advanced FP units to HUB numbers should arithmetic units,” Ph.D. dissertation, Stanford Univ., Stanford, CA, USA,
1996.
result in obtaining more efficient circuits. However, we should [16] G. Gerwig et al., “The IBM eServer z990 floating-point unit,” IBM J.
recall that since the architectures used in this paper are very Res. Develop., vol. 48, nos. 3–4, pp. 311–322, 2004.
basic, the amount of improvement expected will be probably [17] P.-M. Seidel and G. Even, “Delay-optimized implementation of
IEEE floating-point addition,” IEEE Trans. Comput., vol. 53, no. 2,
less when these more advanced architectures are adapted to pp. 97–113, Feb. 2004.
HUB numbers. Therefore, the figures obtained in this study [18] M. K. Jaiswal, R. C. C. Cheung, M. Balakrishnan, and K. Paul,
should be cautiously regarded as the upper bounds of the “Unified architecture for double/two-parallel single precision floating
point adder,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 61, no. 7,
improvements that can be achieved. pp. 521–525, Jul. 2014.
[19] J. Sohn and E. E. Swartzlander, “Improved architectures for a fused
V. C ONCLUSION floating-point add-subtract unit,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 59, no. 10, pp. 2285–2291, Oct. 2012.
This paper presents a way to simplify the FP systems [20] E. Quinnell, E. E. Swartzlander, and C. Lemonds, “Bridge floating-point
through the use of the HUB formats. When the preferred fused multiply-add design,” IEEE Trans. Very Large Scale Integr. (VLSI)
Syst., vol. 16, no. 12, pp. 1727–1731, Dec. 2008.
rounding mode is RN, the implementation of the arithmetic [21] S.-R. Kuang, J.-P. Wang, and H.-Y. Hong, “Variable-latency floating-
unit for dealing with the HUB FP numbers is much simpler point multipliers for low-power applications,” IEEE Trans. Very Large
than when using classic units. The substitution of classic Scale Integr. (VLSI) Syst., vol. 18, no. 10, pp. 1493–1497, Oct. 2010.
[22] J. Y. F. Tong, D. Nagle, and R. A. Rutenbar, “Reducing power by
formats for the HUB formats is feasible, given that we have optimizing the necessary precision/range of floating-point arithmetic,”
theoretically and experimentally demonstrated that both the IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 8, no. 3,
approaches have the same precision assuming the same storage pp. 273–286, Jun. 2000.
[23] H. Suzuki, H. Morinaka, H. Makino, Y. Nakase, K. Mashiko, and
requirements. Using basic architectures, we have shown how T. Sumi, “Leading-zero anticipatory logic for high-speed floating point
they could be adapted to support the HUB numbers and also addition,” IEEE J. Solid-State Circuits, vol. 31, no. 8, pp. 1157–1164,
shown that the implementation figures are greatly improved Aug. 1996.
by using HUB circuits. These results should be intended as
a rough approximation of the improvement achievable for
more advanced FP architectures. We should also note that
several patent applications have been filed regarding several Javier Hormigo received the M.Sc. and
HUB circuits. Ph.D. degrees in telecommunication engineering
from the Universidad de Malaga, Málaga, Spain,
in 1996 and 2000, respectively.
R EFERENCES He was a member of the Image and Vision
[1] M. D. Ercegovac and T. Lang, Digital Arithmetic. San Mateo, CA, USA: Department with the Instituto de Optica, Madrid,
Morgan Kaufmann, 2003. Spain, in 1996. He joined the Universidad
[2] J.-M. Muller et al., Handbook of Floating-Point Arithmetic. Boston, MA, de Malaga in 1997, where he is currently an
USA: Birkhäuser, 2010. Associate Professor with the Computer Architecture
[3] IEEE Standard for Floating-Point Arithmetic, IEEE Standard 754-2008, Department. His current research interests
Aug. 2008. [Online]. Available: http://ieeexplore.ieee.org/servlet/ include computer arithmetic, specific application
opac?punumber=4610933 architectures, and field-programmable gate array.
[4] D. T. Matheny, D. V. Jaggar, and D. J. Seal, “Round increment in an
adder circuit,” U.S. Patent 6 148 314, Nov. 14, 2000.
[5] D. R. Lutz and C. N. Hinds, “Data processing apparatus and method
for performing floating point multiplication,” U.S. Patent 7 640 286,
Dec. 29, 2009. Julio Villalba (M’11) received the B.Sc. degree in
[6] B. Boswell and K. Menezes, “Interface for performing parallel arithmetic physics from the Universidad de Granada, Granada,
and round operations,” U.S. Patent 6 055 555, Apr. 25, 2000. Spain, in 1986, and the Ph.D. degree in com-
[7] G. Even and P.-M. Seidel, “A comparison of three rounding algorithms puter engineering from the Universidad de Malaga,
for IEEE floating-point multiplication,” IEEE Trans. Comput., vol. 49, Málaga, Spain, in 1995.
no. 7, pp. 638–650, Jul. 2000. He was with the Research and Development
[8] N. Burgess, “Prenormalization rounding in IEEE floating-point oper- Department, Fujitsu, Madrid, Spain, from 1986
ations using a flagged prefix adder,” IEEE Trans. Very Large Scale to 1991, where he was an Assistant Professor.
Integr. (VLSI) Syst., vol. 13, no. 2, pp. 266–277, Feb. 2005. Since 2007, he has been a Full Professor with the
[9] N. T. Quach, N. Takagi, and M. Flynn, “Systematic IEEE rounding Department of Computer Architecture, Universidad
method for high-speed floating-point multipliers,” IEEE Trans. Very de Malaga. His current research interests include
Large Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 511–521, May 2004. computer arithmetic and specific application architectures.
[10] P. Kornerup, J.-M. Muller, and A. Panhaleux, “Performing arithmetic Dr. Villalba is currently an Associate Editor of the IEEE T RANSACTIONS
operations on round-to-nearest representations,” IEEE Trans. Comput., ON E MERGING T OPICS IN C OMPUTING and a Steering Committee Member
vol. 60, no. 2, pp. 282–291, Feb. 2011. of ARITH23.