Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO.

6, JUNE 2016 2369

Measuring Improvement When Using HUB Formats


to Implement Floating-Point Systems
Under Round-to-Nearest
Javier Hormigo and Julio Villalba, Member, IEEE

Abstract— This paper analyzes the benefits of using half-unit- one, the addition required for rounding up is postponed until
biased (HUB) formats to implement floating-point (FP) arithmetic the next operation. In [5], a dedicated circuit to compute the
under a round-to-nearest mode from a quantitative point of sticky bit in parallel with the main path was proposed with
view. Using the HUB formats to represent numbers allows the
removal of the rounding logic of arithmetic units, including the aim of accelerating the implementation of multiplication.
sticky-bit computation. This is shown for FP adders, multipliers, A compound adder (a circuit that, having a carry-save input,
and converters. Experimental analysis demonstrates that the delivers the results and the result plus one) was proposed in [6]
HUB formats and the corresponding arithmetic units maintain to generate the rounded result of any operation. In [7], three
the same accuracy as the conventional ones. On the other hand, different methods were compared for multipliers that simplify
the implementation of these units, based on basic architectures,
shows that the HUB formats simultaneously improve area, speed, rounding decisions and merge the rounding up with the
and power consumption. In addition, based on the data obtained computation of the operation. Similarly, Burgess [8] proposed
from the synthesis, an HUB single-precision adder is ∼14% combining rounding with the final addition to convert the
faster but consumes 38% less area and 26% less power than the carry-save solution into the conventional representation. In [9],
conventional adder. Similarly, an HUB single-precision multiplier a rounding scheme was presented for high-speed multipliers
is 17% faster, uses 22% less area, and consumes slightly less
power than the conventional multiplier. At the same speed, the based on a rounding table and prediction.
adder and the multiplier achieve area and power reductions of A totally different approach would be to use a new
up to 50% and 40%, respectively. real-number encoding, in order to simplify the implementation
Index Terms— Adders, digital arithmetic, floating-point (FP) of RN. Thus, the problem would change from optimizing the
arithmetic, multiplication, optimization, power consumption. rounding operation to dealing with arithmetic operations under
the new number representation. This proposal is found in [10]
I. I NTRODUCTION with RN representations and [11] with half-unit-biased (HUB)
formats. Together with other advantages, these new formats
T HE rounding operation is performed in almost all
arithmetic operations involving real numbers. There are
several ways to perform this operation, although unbiased
allow performing RN simply by truncation. On the other hand,
these new formats are based on the simple modifications of
rounding-to-nearest (RN) has the best characteristics [1], [2]. the conventional formats and so could be applied to practically
It provides the closest possible number to the original exact any conventional format. In this paper, we focus on the HUB
value, but if the exact value is exactly halfway between two FP formats.
The efficiency of using the HUB formats for a fixed-
numbers, then it is selected randomly. The most commonly
point representation has been demonstrated in [12] and [13].
used approach is the tie-to-even method, which is the default
By reducing the bitwidth while maintaining the same accuracy,
mode of the floating-point (FP) IEEE-754 standard (see [3]).
the area cost and delay of finite impulse response filter
However, the implementation of this rounding mode is
implementations have been dramatically reduced in [12], and
relatively complex, and the area and the delay introduced
similarly for the QR decomposition in [13].
for rounding circuits may be very large, since they normally
In this paper, we perform a quantitative estimation of the
lead in the critical path. For this reason, it is generally
benefit obtained using the HUB formats to implement the
used in the FP circuits. Many researchers have proposed
FP computation systems under RN. Some preliminary results
different architectures to reduce the impact of this delay by
for half-precision FP adders and multipliers were presented
merging rounding with other operations or removing it from
in [14]. The work in [14] shows that the area and power
the critical path. For instance, an FP adder was proposed in [4],
consumption of a basic FP adder could be improved by
wherein if the result of an addition is the input to another
up to 70% for high frequencies when using HUB formats,
Manuscript received July 17, 2015; revised October 20, 2015; accepted whereas they remain the same for the basic FP multiplier.
November 17, 2015. Date of publication December 8, 2015; date of current In addition to a deeper analysis, in this paper, we extend
version May 20, 2016. This work was supported by the Ministry of Education
and Science of Spain under Contract TIN2013-42253-P.
these results to other sizes and circuits, such as converters.
The authors are with the Department of Computer Architecture, Universidad In comparison with the work in [14], the main contributions
de Málaga, Málaga E-29071, Spain (e-mail: fjhormigo@uma.es). of this paper are as follows:
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. 1) a detailed architecture for basic adders and multipliers
Digital Object Identifier 10.1109/TVLSI.2015.2502318 to deal with HUB numbers;
1063-8210 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2370 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016

by a bitwise inversion and RN by truncation. The unbiased


rounding may require forcing the LSB to zero, when all the
discarded bits are zero [11].

III. BASIC O PERATIONS U NDER HUB F ORMATS


The general procedure to operate with the HUB numbers
involves the following steps [11]: First, the ILSB is explicitly
appended to the significand of input operands. Second, the
operation is performed in a classic way, such that all the bits
of the significand result before rounding are obtained. Finally,
the significand is rounded simply by truncation. However,
since the ILSB is a constant value, the datapath could be
further optimized depending on the specific operation. Next,
we develop several architectures to support the HUB numbers.
These architectures are adapted from the basic architectures
Fig. 1. Example of ERNs for a conventional FP format and its HUB version. described in [1]. We are aware that many optimizations to
these architectures have been proposed in the literature, such
2) a study of the conversions between different FP formats as those presented in [1], [2], and [15]–[23]. However, none
and the corresponding architectures; can be selected as the best, since this selection depends on
3) the experimental comparison of accuracy between HUB many different factors. Even if the more relevant ones were
and conventional formats; selected, it would not be possible to review all of them
4) measures of improvements in area, speed, and power in this paper. Thus, we simply describe the adaptation of
consumption for single- and double-precision adders, these basic architectures with the aim of their being used
multipliers, and converters. as examples to investigate the implementation of other much
The rest of this paper is organized as follows. Section II more sophisticated approaches.
reviews the fundamentals of the HUB FP formats, focusing Next, we study in detail the architectures to imple-
on rounding operations. Section III addresses the archi- ment two basic operations, addition and multiplication, and
tectures for implementing the FP operations under the different conversions. For this purpose, let us consider an
HUB formats, specifically addition, multiplication, and con- m-bit significand including the leading one, which is explicitly
versions. In Section IV, we provide an experimental error represented to facilitate the explanation. In contrast, the ILSB
analysis to confirm the viability of using the HUB formats of the HUB version is not included in m. We focus on the
instead of classic ones, and also compare the results of the implementation of the significand path, because the exponent
application-specific integrated circuit (ASIC) implementation path remains practically unchanged. We will draw the attention
of a basic adder, a basic multiplier, and several converters. The to any modification required.
conclusions are presented in Section V.
A. FP Addition for HUB Numbers
II. H ALF -U NIT-B IASED FP F ORMATS A basic FP addition for conventional numbers requires
An HUB FP format should include a significand that is several steps [1], which could be implemented using the
represented using an HUB fixed-point number and an exponent significand datapath shown in Fig. 2(a). First, the operand
that is represented in any conventional way [11]. An HUB exponents (d = E x − E y ) are compared and the significands
fixed-point format is produced when the exactly represented are aligned accordingly. The latter is usually performed by
numbers (ERNs) of a conventional representation are increased right shifting (|d| bits) the significand corresponding to the
(or biased) by half unit in the last place (ULP). This shifting number with the lowest exponent, which is selected using the
of the ERNs could be seen as an implicit least significant swap module and the sign of d. The computation of the sticky
bit (ILSB) set to one. For example, the HUB version of the bit corresponding to the bits shifted beyond the precision of
IEEE-754 single precision has 25 bits for the significand, the significand is also performed. The sticky bit is required
where the first and last bits are implicit and equal to 1, but for the computation of two’s complement and rounding.
only 23 bits are stored, as in the conventional version. Second, either the effective addition or the subtraction of
Fig. 1 shows an example for a binary FP format with the aligned significands is performed for the m + 3 MSBs
3-bit significand. Given a real value, its representation using (the significand plus guard, round, and sticky bits). In general,
either conventional or HUB formats will produce different in order to perform subtraction, the significand corresponding
rounding errors, although the accuracy of both the formats is to the lower exponent is previously one’s complemented using
the same [11]. In fact, the rounding errors for both the formats the significand comparator and the conditional bit inverters.
are complementary (i.e., the addition of both the rounding Moreover, the sticky bit is introduced in the adder to complete
errors equals 0.5 ULP). the two’s complement transformation.
The main advantage of computing the HUB formats is The result of addition has to be normalized by shifting 1 bit
that the two’s complement operation is implemented simply to the right, if an overflow is produced. Otherwise, it is shifted
HORMIGO AND VILLALBA: MEASURING IMPROVEMENT WHEN USING HUB FORMATS TO IMPLEMENT FP SYSTEMS UNDER RN 2371

Fig. 2. Basic FP adder architectures for (a) standard and (b) HUB numbers.

to the left if there are leading zeros whose number is computed the fixed-point adder. Despite this, the fixed-point adder is
in the leading one detector. Besides normalization, the result slightly shorter than the one shown in Fig. 2(a). The latter
has to be rounded based on the two LSBs of the result and the requires two guard bits and the carry input for the sticky bit
sticky bit. However, no roundup is required when the result (i.e., m + 2 bits) due to the rounding operation. However, in the
has at least two leading zeros [1], [23]. Thus, left shifting and HUB approach shown in Fig. 2(b), no guard bits are required
rounding are performed in parallel paths. On the other hand, because rounding is performed by truncation. Thus, the fixed-
the new exponent is also generated in a parallel path. If the point adder has only m + 1 bits (one additional bit to support
result of addition is rounded up, an overflow may be produced, the ILSB). In fact, the ILSB is shown in the architecture to
which requires a new correction of the exponent. simplify the explanation, although, as presented in [11], this
The same basic architecture could be used for the fixed-point addition can be implemented using an m-bit adder
HUB numbers, although the significands are 1 bit larger and and an inverter.
the rounding circuit is removed. However, knowing that the Finally, the rounding path [gray in Fig. 2(a)] is removed,
ILSB always equals 1, the significand datapath is further because rounding is simply performed by truncation.
optimized, as shown in Fig. 2(b). The first difference is in Consequently, given that explicit rounding up is not performed
the right shifter used to align the significands, because the for the HUB architecture, overflow could not occur after
ILSB has to be included at the input to obtain a correct rounding. Thus, the additional correction of the exponent
result if no shifting is performed. Furthermore, the sticky-bit required in Fig. 2(a) is eliminated, which also simplifies the
computation logic has been removed. Given that the ILSBs exponent datapath.
of both the significands equals 1, the sticky bit is always one
for nonaligned significands [11]. Moreover, the sticky bit is B. FP Multiplication for HUB Numbers
not required for aligned significands because shifting is not A classic FP multiplication is simpler than FP addition [1].
performed. The new exponent is computed by adding the exponents of
In this HUB architecture, we should note that the condi- the input operands, whereas the significands are multiplied,
tional bit inverters directly perform the two’s complement, thus obtaining a value that is double the size. The result of
as explained in Section II, and no carry input is required in multiplication is normalized by shifting it 1 bit to the right,
the fixed-point adder. The conditional inverter at the output of if it is required. Finally, the resulting number is rounded to fit
the right shifter has to be modified to control when shifting is the size of the significand. The rounding requires computing
not performed. In this case, the ILSB is explicitly represented the sticky of the m − 2 LSBs of the result of the fixed-point
by the LSB of the output. It then has to be set to 1 after multiplication, and the addition of one ULP for rounding it up.
the inversion to complete the two’s complement operation Fig. 3(a) shows the significand datapath of a straightforward
(see Section II). implementation of the conventional FP multiplication.
On the other hand, Fig. 2(b) shows that the ILSB of the When the HUB numbers are used, the significands again
second operand is appended at the corresponding input of have to be appended with the hidden ILSB that equals one,
2372 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016

Fig. 3. Basic FP multiplier for (a) standard and (b) HUB numbers.

Fig. 4. Basic converter from an FP format to a narrower one for (a) standard
as shown in Fig. 3(b). Thus, the fixed-point multiplier is 1 bit and (b) HUB numbers.
larger, which increases the size of the fixed-point multiplier.
In contrast, the computation of the sticky bit is again pre-
vented [11]. To simplify the explanation, let us assume that exponent by 1. However, under the HUB formats, all these
the significand is scaled such that the ILSB is the only operations are avoided, and a simple truncation of the signifi-
fractional bit. Let us call the integer part of both the input cand will produce the correctly rounded conversion. We should
significands A and B. The product of both the significands is note that the ILSB of the original number guarantees that the
then sticky bit is always 1, which means that the tie case does not
1 exist.
(A + 1/2) · (B + 1/2) = A · B + (A + B) + 1/4. (1) Apart from the dramatic simplification of the conversion
2
Given the last addend of this result, we conclude that the LSB operation, the use of the HUB formats prevents the double
of the result of multiplication is always 1. Thus, the sticky bit rounding error [11]. This well-known problem may occur
is always 1 and no logic is required to compute it. Moreover, when a number is rounded twice in a row (e.g., when the
given that the rounding is simply performed by truncation, the result of an arithmetic operation is first rounded to double pre-
cision and then to single precision). Using standard numbers,
final incrementer is not required [see Fig. 3(b)]. Therefore, as
shown in Section IV-B, although the area of the fixed-point this operation may lead to the final value having an error
multiplier increases, the overall area decreases. greater than 0.5 ULPs. However, this cannot occur when using
HUB numbers [11].
On the other hand, conversion the other way around requires
C. Conversion Between FP Numbers of expanding the significand to fit it into the new size. This is a
Different Sizes for HUB Numbers trivial operation for a standard format, because as many zeroes
Conversion between FP formats with different sizes as needed are appended to the least significant part of the
is another operation that could benefit from using the significand. In relation to the HUB formats, a similar solution
HUB formats. We will focus on the conversion of the signif- could be applied, although the MSB of the bits appended
icand, since the conversion of exponent is not affected when should be 1 (the ILSB) instead of 0. This approach is shown
using the HUB formats (because the HUB FP formats use a in Fig. 5(a).
conventional number to represent the exponent). Due to the new ILSB of the generated HUB number, the
The conversion from an FP format to another format with circuit in Fig. 5(a) produces a rounded tie-to-away significand
a narrower significand requires a rounding operation. For because the significand is always rounded up. The error
instance, the 53-bit significand of a double-precision number introduced due to this rounding is negligible, because the
has to be rounded to 24 bits to turn it into a single-precision error associated with the original FP number is usually several
number. Fig. 4 shows an example of this situation. The orders of magnitude greater. For example, the representation
standard RN tie-to-even rounding mode would require com- of the value 0.1 under the IEEE-754 single-precision standard
puting the sticky bit, determining the need for rounding up, produces an error that equals −1.4901161 × 10−9 assuming
and incrementing the truncated significand value if required. the default rounding mode. When it is extended, the same
These operations involve a considerable amount of hardware. amount of error is transferred to the double-precision number.
Moreover, the rounding-up operation may produce a Similarly, the error under the single-precision HUB format
significand overflow, which would require incrementing the is 2.2351743 × 10−9 which, in this case, is reduced
HORMIGO AND VILLALBA: MEASURING IMPROVEMENT WHEN USING HUB FORMATS TO IMPLEMENT FP SYSTEMS UNDER RN 2373

TABLE I
E XAMPLES OF C ONVERSIONS B ETWEEN THE S IGNIFICAND
OF HUB AND C ONVENTIONAL N UMBERS

tie-to-away behavior is not desired (for example, when the


sign of the input numbers is not equally distributed), a kind
of tie-to-even rounding could be easily implemented. To do
this, the explicit LSB of the significand (the second LSB,
Fig. 5. Basic converter for an FP format to a larger one for HUB numbers. if the ILSB is taken into account) is forced to 0. In this way,
(a) Biased. (b) Unbiased. significands that were initially even are rounded up, whereas
the odd ones are rounded down.
On the other hand, the conversion from the HUB FP format
to 6.94×10−18 by appending the ILSB of the double-precision
into the conventional format could also be performed without
HUB number. It can clearly be seen that this rounding-up
any explicit operation. The ILSB of the significand is virtually
operation barely affects the original error value.
removed, and then, the magnitude is implicitly rounded down.
Nevertheless, if this bias is still a problem for certain
Consequently, doing nothing to convert the numbers actually
applications, another solution is shown in Fig. 5(b). This
produces a tie-to-zero rounding. In contrast, producing a
approach selects the direction of the rounding based on the
tie-to-even rounding requires much more hardware because it
explicit LSB of the input significand. If this LSB is 0, the
involves incrementing the number if its LSB equals 1 (recall
significand is extended as in the previous solution; thus, it is
that it is known in advanced that it is a tie case, and thus
rounded up. If the LSB is 1, it is extended with a 0 followed
no other testing is required). However, tie-to-odd behaves
by 1s, which actually produces a rounding down (recall the
similar to tie-to-even rounding mode, and it is much easier
ILSB set to 1 of the input significand).
to implement simply by setting the LSB of the number to 1.
In this way, even numbers are rounded down, whereas odd
D. Conversion Between Conventional and HUB Formats numbers are rounded up. Table I shows several examples
The conversion between numbers under a conventional of these conversions using a 4-bit significand for different
format and its corresponding HUB format (i.e., both the rounding modes. The numbers between parentheses are the
formats having the same number of explicit bits) may be corresponding decimal values (recall that the HUB numbers
required when data are exchanged between systems working in add 0.5 ULP to the corresponding conventional value).
these different formats. Given that those formats do not share
any ERN (except special cases), this conversion always implies IV. I MPLEMENTATION R ESULTS AND C OMPARISON
a rounding and, consequently, a rounding error. In addition, In this section, we experimentally study the convenience
each ERN of one format is always at the midpoint between of using the proposed HUB FP formats to implement the
two ERNs of the other format, i.e., it is a tie case. This fact real-number computation. First, we provide an experimental
simplifies the hardware, because the computation of the sticky error analysis to confirm that the use of the HUB formats
bit is not required; however, the magnitude of rounding error does not damage the accuracy of the computation. Second,
is always 0.5 ULPs. we analyze the main results of the hardware implementation
The conversion from the conventional FP format to of the proposed HUB FP circuits compared with the classic
the HUB format could be performed without any explicit implementation.
operation. In this case, given that the HUB format virtually
appends the ILSB to the initial significand, an implicit
rounding up of the magnitude is always produced. Thus, A. Experimental Error Analysis
a tie-to-away is actually produced, because an effective An empirical error study is provided to demonstrate that
rounding up is obtained for positive numbers, whereas it the HUB formats could be used instead of the IEEE standard
is a rounding down for negative numbers. However, if this in FP-specific applications, while guaranteeing the same level
2374 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016

Fig. 6. Histogram of the rounding error for the addition experiment. Fig. 7. Histogram of the rounding error for the multiplication experiment.
(a) HUB format. (b) IEEE standard. (a) HUB format. (b) IEEE standard.

TABLE II
S TATISTICAL PARAMETERS OF ROUNDING E RROR D ISTRIBUTION

Fig. 8. Histogram of the rounding error for double-to-single-precision


conversion. (a) HUB format. (b) IEEE standard.

of precision. In a first experiment, we tested the arithmetic the main statistical parameters corresponding to these errors.
operations. In these experiments, we utilized 32-bit two’s It can be seen that these parameters are very similar when
complement fixed-point numbers within the range (−1 1) both the representations are compared.
(i.e., 1 sign bit and 31 fractional bits) as exact real input A similar experiment was run for multiplication. Again,
numbers. They were converted into an internal 32-bit FP for- two-million exact multiplication results were compared with
mat with only 24 bits for the significand. Thus, a rounding was the results obtained with the FP internal representation when
required for the said conversion. Two different internal formats using the IEEE standard single precision and its equivalent
were tested: 1) the standard IEEE-754 single precision and HUB format. The input operands were also 32-bit fixed-
2) its corresponding HUB format. point numbers, although, in this case, only positive numbers
To test the addition operation, two-million exact results were used, whereas 64-bit fixed-point arithmetic was used to
corresponding to the addition of two input real numbers were calculate the exact multiplication result. Similar to the addition
calculated using 32-bit fixed-point arithmetic (excluding the experiment, although the values of the rounding error for both
results that produced overflow). Moreover, the FP results the representations are different, their statistical parameters are
of adding the same pairs of numbers that were previously very similar, as shown in Table II and Fig. 7, by the histogram
converted into single-precision FP format were computed for of the rounding error for both the HUB format and the
both the IEEE standard and the HUB format. For the former, IEEE standard.
a standard CPU was used to perform the computation, whereas In another experiment, we measured the precision of
for the latter, a field-programmable gate array implementation the conversion from double to single precision. One-million
of the HUB adder presented in Section III-A was used. These double-precision real numbers were randomly generated and
results were converted back into fixed-point representation converted into single-precision format. They were then con-
(considered exact in our experiment) and compared with the verted back into double precision and compared with the
exact results obtained using the fixed-point addition. original numbers. As expected, the statistical distributions of
The calculated error comprises the error in the operation the error for both the standard and HUB formats were once
itself and in conversions. We should note that the values of again very similar. Fig. 8 shows the histograms of the rounding
the rounding error corresponding to both the approaches are error for both the approaches, and their statistical parameters
always different. This happens because the value used for are shown in Table II.
representing the exact real number is different on each inter- The theoretical and experimental results indicate that the
nal representation (see Section II). However, the probability HUB formats could substitute for conventional FP representa-
distributions of these errors are quite similar, as shown by the tion to deal with real-number computation. Clearly, the use of
histogram of the rounding error for both the HUB format and the HUB format to solve real-number calculation would not
the IEEE standard shown in Fig. 6. Moreover, Table II shows provide exactly the same results as those obtained by the use
HORMIGO AND VILLALBA: MEASURING IMPROVEMENT WHEN USING HUB FORMATS TO IMPLEMENT FP SYSTEMS UNDER RN 2375

Fig. 9. Comparison of adder implementations for single precision. (a) Area.


(b) Power. Fig. 10. Comparison of adder implementations for double precision.
(a) Area. (b) Power.

of the IEEE standard, although the accuracy of this calculation


would be the same. consumes 35% more power (even working at a lower speed)
than the HUB adder. Therefore, the HUB approach simultane-
B. Implementation Results ously increases the speed and reduces the area and the power
To measure the improvements obtained by using the consumption.
HUB format and the corresponding circuits, the basic A similar comparison is provided for double-precision
circuits studied in Section III have been described in very numbers in Fig. 10. Again, an important area and power
high speed integrated circuit hardware description language consumption reduction is achieved, although a little lower,
and implemented targeting the ASIC technology. The adders, between 15% and 35%. Regarding the speed, the improvement
multipliers, and converters for the standard and HUB formats is only 8% (650 MHz for HUB adder and 600 MHz for
have been implemented using the Synopsys Design Compiler standard one). However, taking the fastest implementation of
version Z-2007.03 and the TSMC 65-nm standard cell library, both the approaches, even being faster, the HUB adder requires
targeting clock frequencies ranging from 200 MHz to 1.1 GHz 16% less area and consumes 13% less power.
in 50-MHz steps. The area and power consumption of both the As expected, improvements in the multipliers are less than
approaches are compared for the same clock frequency. those in the adders, although they are still very significant.
Fig. 9 shows the area and power consumption of the Fig. 11 shows the area required and the power consump-
standard and the HUB adders for single precision. In addition tion for both single-precision multipliers. The improvement
to the values in absolute terms, the HUB-to-standard ratio is is <10% for low frequencies, but becomes very significant
also represented to facilitate the comparison. It can clearly when the frequency increases, achieving a reduction of up
be seen that the HUB adder always requires significantly less to 38% and 35% in area and power, respectively. Similar to
area than the standard adder, especially when the clock fre- adders, the HUB multiplier is 17% faster than the conventional
quency increases. In addition, the HUB adder requires between one. If the fastest implementations of both the approaches are
∼20% and 50% less area and power than the standard adder. compared, it can be seen that the HUB multiplier requires 22%
On the other hand, the maximum frequency achievable for less area and consumes slightly less power (2%). The similar
the standard adder is only 700 MHz, whereas it is 800 MHz behavior can be seen in the double-precision implementation,
for the HUB adder. This means that the HUB adder is as shown in Fig. 12. In this case, both the area and the power
14% faster than the standard one. Furthermore, if we con- reduction increase by 32% and the fastest implementation
sider the fastest implementation in each case, besides being achieves a speedup of 14%, with 12% less area and slightly
slower, the standard adder requires 63% more area and more power consumption (2.6%).
2376 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016

Fig. 11. Comparison of multiplier implementations for single precision.


(a) Area. (b) Power.

Fig. 13. Comparison of converter implementation from double to single


precision. (a) Area. (b) Power.

into single precision. It can be observed that the area of the


HUB implementation is practically constant for the ranges
of the frequencies tested. This is because the critical path is
very short for this circuit. In this case, the elimination of the
rounding logic dramatically reduces the area. The circuit for
the standard format requires between 2.5 and 4.4 as much
area as the HUB approach. Regarding power consumption,
although this reduction is slightly less, as shown in Fig. 13(b),
it is still very high. The power consumption of the standard
converter is between 2 and 3.6 times greater than the one for
the HUB circuit. However, we should note that the relative
impact of these circuits on the overall area or power is much
lower than that on the adder or the multiplier.
In contrast, the conversion from single to double precision
is simpler in the standard approach. Similar to the previous
HUB converter, the critical paths of these converters are
too short to produce significant variations in the area
at the frequencies tested. The standard converter requires
92.88 µm2 , whereas the HUB approach with tie-to-away
requires 96.12 µm2 , which is ∼3.5% more area. However,
the HUB converter with tie-to-even-like rounding requires
116.64 µm2 , which is ∼25% more area. Accordingly, the
Fig. 12. Comparison of multiplier implementations for double precision.
(a) Area. (b) Power.
tie-to-away HUB converter consumes only 1% more power
than the standard converter, but the other converter requires
Similarly, we also studied the circuits used to convert 75% more power. Clearly, the tie-to-away approach is the
between single precision and double precision. Fig. 13(a) preferred solution for these kinds of conversions, unless this
shows the area required for converting from double precision approach cannot be applied due to application restrictions.
HORMIGO AND VILLALBA: MEASURING IMPROVEMENT WHEN USING HUB FORMATS TO IMPLEMENT FP SYSTEMS UNDER RN 2377

In any case, as stated, the area or power consumption of these [11] J. Hormigo and J. Villalba, “New formats for computing with real-
circuits is much lower than arithmetic circuits, and thus their numbers under round-to-nearest,” IEEE Trans. Comput., to be published,
doi: 10.1109/TC.2015.2479623.
impact is very low. [12] J. Hormigo and J. Villalba, “Optimizing DSP circuits by a new family
In summary, the use of the HUB FP numbers could of arithmetic operators,” in Proc. Asilomar Conf. Signals, Syst. Comput.,
significantly improve the implementation of arithmetic units. Nov. 2014, pp. 871–875.
[13] S. D. Muñoz and J. Hormigo, “Improving fixed-point implementation
In all the tested cases, except for the converter from single to of QR decomposition by rounding-to-nearest,” in Proc. 19th IEEE Int.
double precision, HUB units clearly outperform standard units, Symp. Consum. Electron. (ISCE), Jun. 2015, pp. 1–2.
in area, speed, and power consumption. Due to the nature [14] J. Hormigo and J. Villalba, “Simplified floating-point units for high
dynamic range image and video systems,” in Proc. 19th IEEE Int. Symp.
of the improvement, which is based on the simplification Consum. Electron. (ISCE), Jun. 2015, pp. 1–2.
of the FP algorithms themselves, it is expected that the [15] S. F. Oberman, “Design issues in high performance floating point
adaptation of more advanced FP units to HUB numbers should arithmetic units,” Ph.D. dissertation, Stanford Univ., Stanford, CA, USA,
1996.
result in obtaining more efficient circuits. However, we should [16] G. Gerwig et al., “The IBM eServer z990 floating-point unit,” IBM J.
recall that since the architectures used in this paper are very Res. Develop., vol. 48, nos. 3–4, pp. 311–322, 2004.
basic, the amount of improvement expected will be probably [17] P.-M. Seidel and G. Even, “Delay-optimized implementation of
IEEE floating-point addition,” IEEE Trans. Comput., vol. 53, no. 2,
less when these more advanced architectures are adapted to pp. 97–113, Feb. 2004.
HUB numbers. Therefore, the figures obtained in this study [18] M. K. Jaiswal, R. C. C. Cheung, M. Balakrishnan, and K. Paul,
should be cautiously regarded as the upper bounds of the “Unified architecture for double/two-parallel single precision floating
point adder,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 61, no. 7,
improvements that can be achieved. pp. 521–525, Jul. 2014.
[19] J. Sohn and E. E. Swartzlander, “Improved architectures for a fused
V. C ONCLUSION floating-point add-subtract unit,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 59, no. 10, pp. 2285–2291, Oct. 2012.
This paper presents a way to simplify the FP systems [20] E. Quinnell, E. E. Swartzlander, and C. Lemonds, “Bridge floating-point
through the use of the HUB formats. When the preferred fused multiply-add design,” IEEE Trans. Very Large Scale Integr. (VLSI)
Syst., vol. 16, no. 12, pp. 1727–1731, Dec. 2008.
rounding mode is RN, the implementation of the arithmetic [21] S.-R. Kuang, J.-P. Wang, and H.-Y. Hong, “Variable-latency floating-
unit for dealing with the HUB FP numbers is much simpler point multipliers for low-power applications,” IEEE Trans. Very Large
than when using classic units. The substitution of classic Scale Integr. (VLSI) Syst., vol. 18, no. 10, pp. 1493–1497, Oct. 2010.
[22] J. Y. F. Tong, D. Nagle, and R. A. Rutenbar, “Reducing power by
formats for the HUB formats is feasible, given that we have optimizing the necessary precision/range of floating-point arithmetic,”
theoretically and experimentally demonstrated that both the IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 8, no. 3,
approaches have the same precision assuming the same storage pp. 273–286, Jun. 2000.
[23] H. Suzuki, H. Morinaka, H. Makino, Y. Nakase, K. Mashiko, and
requirements. Using basic architectures, we have shown how T. Sumi, “Leading-zero anticipatory logic for high-speed floating point
they could be adapted to support the HUB numbers and also addition,” IEEE J. Solid-State Circuits, vol. 31, no. 8, pp. 1157–1164,
shown that the implementation figures are greatly improved Aug. 1996.
by using HUB circuits. These results should be intended as
a rough approximation of the improvement achievable for
more advanced FP architectures. We should also note that
several patent applications have been filed regarding several Javier Hormigo received the M.Sc. and
HUB circuits. Ph.D. degrees in telecommunication engineering
from the Universidad de Malaga, Málaga, Spain,
in 1996 and 2000, respectively.
R EFERENCES He was a member of the Image and Vision
[1] M. D. Ercegovac and T. Lang, Digital Arithmetic. San Mateo, CA, USA: Department with the Instituto de Optica, Madrid,
Morgan Kaufmann, 2003. Spain, in 1996. He joined the Universidad
[2] J.-M. Muller et al., Handbook of Floating-Point Arithmetic. Boston, MA, de Malaga in 1997, where he is currently an
USA: Birkhäuser, 2010. Associate Professor with the Computer Architecture
[3] IEEE Standard for Floating-Point Arithmetic, IEEE Standard 754-2008, Department. His current research interests
Aug. 2008. [Online]. Available: http://ieeexplore.ieee.org/servlet/ include computer arithmetic, specific application
opac?punumber=4610933 architectures, and field-programmable gate array.
[4] D. T. Matheny, D. V. Jaggar, and D. J. Seal, “Round increment in an
adder circuit,” U.S. Patent 6 148 314, Nov. 14, 2000.
[5] D. R. Lutz and C. N. Hinds, “Data processing apparatus and method
for performing floating point multiplication,” U.S. Patent 7 640 286,
Dec. 29, 2009. Julio Villalba (M’11) received the B.Sc. degree in
[6] B. Boswell and K. Menezes, “Interface for performing parallel arithmetic physics from the Universidad de Granada, Granada,
and round operations,” U.S. Patent 6 055 555, Apr. 25, 2000. Spain, in 1986, and the Ph.D. degree in com-
[7] G. Even and P.-M. Seidel, “A comparison of three rounding algorithms puter engineering from the Universidad de Malaga,
for IEEE floating-point multiplication,” IEEE Trans. Comput., vol. 49, Málaga, Spain, in 1995.
no. 7, pp. 638–650, Jul. 2000. He was with the Research and Development
[8] N. Burgess, “Prenormalization rounding in IEEE floating-point oper- Department, Fujitsu, Madrid, Spain, from 1986
ations using a flagged prefix adder,” IEEE Trans. Very Large Scale to 1991, where he was an Assistant Professor.
Integr. (VLSI) Syst., vol. 13, no. 2, pp. 266–277, Feb. 2005. Since 2007, he has been a Full Professor with the
[9] N. T. Quach, N. Takagi, and M. Flynn, “Systematic IEEE rounding Department of Computer Architecture, Universidad
method for high-speed floating-point multipliers,” IEEE Trans. Very de Malaga. His current research interests include
Large Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 511–521, May 2004. computer arithmetic and specific application architectures.
[10] P. Kornerup, J.-M. Muller, and A. Panhaleux, “Performing arithmetic Dr. Villalba is currently an Associate Editor of the IEEE T RANSACTIONS
operations on round-to-nearest representations,” IEEE Trans. Comput., ON E MERGING T OPICS IN C OMPUTING and a Steering Committee Member
vol. 60, no. 2, pp. 282–291, Feb. 2011. of ARITH23.

You might also like