"Digit-Recurrence Algorithms For Division and Square Root With Limited Precision Primitives" Literature Survey

”Digit-Recurrence Algorithms for Division and Square Root with Limited Precision
Primitives”
LITERATURE SURVEY:
We extend the digit-recurrence algorithm for division (F-DIV) presented in to
square-root operation and discuss a combined divide/square-root scheme. The algorithms in
this class are characterized by Digit-by-digit use of the operands, similar to on-line
algorithms .Short precision residuals - no more than 2 radix-r digits independent of the step.
Selection using short reciprocal and round-Compensation needed because of the on-line
mode - requiring a short-precision sum of digit by digit products.
In on-line algorithms the error in the residual due to the incremental use of operands
is compensated in each step by adding a term missed in prioty. We now introduce an
algorithm for square rooting which has similar characteristics as F-division. Let s = ...G. We
obtain a short-precision approximation a by rounding to p most-significant digits. Let b
correspond to the remaining n — p digits of s. To compute part b, we apply F-division
algorithm: the dividend in this case is y = — a , and the (full-precision) divisor is 2a + b.
Since F-DIV uses a short divisor, it sumces to take d* — 20. The digits of b are produced in
on-line mode, beginning with the most-significant digit of b. As in F-DIV, the dividend y =
c — a2 is produced and applied digit-serially. The first two digits of y are in the signed-digit
form after the subtraction of 02 ; for the rest, yj — c The correction terms are computed by
the digits of b as they are produced. The part a, a short-precision approximation to E, is
obtained from a table TSQR using the N'IS digits of c. The F-DIV selection function uses a
short reciprocal g — l/d*. This reciprocal can be stored in TSQR or obtained from TREC
using 2a as the input with an extra delay. We assume a former approach.
We presents an algorithm for square root which uses radix-r digit recurrence division
with short precision primitives. A combined • division/square root implementation is
described. The proposed scheme for radix-512 has a cycle time of (8.4 + 1.9) 10.3T
compared with a cycle time of 8.15T for the radix-512 division with pre-scaling. However,
the proposed scheme uses short In multipliers which may have an advantage at the layout
level. Regarding cost, the proposed scheme uses 38% less area than a combine, we
division/square root scheme with pre scaling. This makes combined F-DIV/SQR interesting
in low power designs. Since all primitive modules are digit by digit, this class of algorithms
is suitable for higher radix implementation . Since the precision of all modules is short,
designs with nonredundant outputs may be faster[simpler than implementation ng with
redundant outputs. This has The scheme discussed uses the modules defined in (51 and
repeated here for ease of reference. The delays are expressed in terms of T - the delay of a
full adder. The cost is given in terms of K - the cost of a full-adder. The delay and cost of
modules are estimated for k 9 (r = 512).
We Reciprocal table TREC: 2-digit input d* , 1 (significant) digit output g in radix-4

recoded form. Direct size is about k x 22k • square-root table TSQR is of size about (k + k -l-
2k
2k) x 2 producing 81, I/SI, and s}. Bipartite organization or some approximation used as
necessary for large k; delay tREC and tSQR. Note that TSQR width can be reduced by using
TREC to obtain I/SI while increasing the latency by one cycle.
”DIGITAL ARITHMETIC”:
TYPES OF ARITHMETIC ALGORITHMS:
Bottom-up development
Primitives
+ Addition/subtraction
+ Multi operand addition
+ Arithmetic shifts
+ Multiplication by digit
+ Result-digit selection (PLA)
+ Table look-up
+ Multiplication
a) Arithmetic level
+ Reducing number of steps

Example: higher radix
Example: combinational instead of sequential
+ Reducing time of step

Example: carry-save adder instead of carry-propagate
+ Overlap steps (concurrency/pipelining)

b) Implementation level
+ Reduce number of logic levels

Measures
+ Packaging
+ Interconnection complexity
+ Number of pins
+ Number of chips and types of chips
+ Number of gates and types of gates
+ Area
+ Design cost; verification and testing cost
+ Power dissipation
+ Power consumption
New Metrics for the Reliability of Approximate and Probabilistic Adders

A framework based on a precise and specific implementation can still be used
with a methodology that intrinsically has a lower degree of precision and an increasing
uncertainty in operation. While this may be viewed as a potential conflict, such an
approach tailors the significant advantage of inexact computing (and its inherent
tolerance to some imprecision and uncertainty) to a technology platform implemented
by conventional digital logic and systems. The paradigm of inexact computation relies
on relaxing fully precise and completely deterministic building blocks (such as a full
adder) when for example, implementing bio-inspired systems. This allows nature
inspired computation to redirect the existing design process of digital circuits and
systems by taking advantage of a decrease in complexity and cost with possibly a
potential increase in performance and power efficiency.
In imprecise computation, however, traditional measures for performance, power
and reliability are often conflicting, so new figures of merit for assessing the tradeoffs
involved in such design process are needed to better understand the operation of
approximate/probabilistic circuits. One of the fundamental arithmetic operations in many
applications of inexact computing is addition . Soft additions are generally based on the
operation of deterministic approximate logic or probabilistic imprecise arithmetic
(categorized in as design-time and run-time techniques). Several recently proposed adder
architectures are representatives of these types. The bio-inspired lower-part OR adder (LOA)
is based on approximate logic, whose truth table is slightly different from the original truth
table of a full adder. The approximate mirror adders (AMAs) proposed in save power by
reducing the number of transistors in a mirror adder design. The use of these architectures
results in approximations to the addition, making it deterministically different from the
precise operational outcome. Another design is known as the probabilistic full adder (PFA)
its implementation is based on probabilistic CMOS, a technology platform for modeling the
behavior of nanometric designs as well as reducing power consumption.
The objective of this paper is to propose new metrics for assessing adder designs
with respect to reliability and power efficiency for inexact computing. A new figure of
merit referred to as error distance (ED) is initially proposed to characterize the reliability of
an output of an adder. ED is then used to obtain two new metrics: the mean error distance
(MED) and the normalized error distance (NED). The MED and NED can be obtained using
sequential probability transition matrices (SPTMs) and are able to evaluate the reliability of
both probabilistic and deterministic adders. It is shown that the MED is an effective metric
in evaluating the implementation of a multiple-bit adder. The NED is a stable metric that is
almost independent of the size of an implementation; this feature brings a new perspective
for the evaluation and comparison of different adder designs. The power and NED product is
further used to evaluate the power and precision tradeoff. An adder implementation with
reduced precision, referred to as the lower-bit ignored adder (LIA), is investigated as a
baseline design for assessing the LOA, AMAs and PFAs. A detailed analysis and simulation
results are presented to assess the reliable performance of these adders using the proposed
new metrics.
Adaptive Approximation in Arithmetic Circuits: A Low-Power Unsigned Divider

Design
To save additional hardware, a dynamic approximate divider (DAXD) has been

designed [10]. In a 2n/n DAXD, 2k/k bits in the inputs are dynamically selected starting from
the most significant ‘1’, whereas the least significant bits (LSBs) are truncated. The quotient
is finally obtained approximately by using a 2k/k exact divider and a shifter. As the circuit
complexity and critical path of a 2n/n array divider are in O(n2) [1], DAXD shows a
substantial improvement in speed, area and power consumption compared with AXDnr and
AXDr. However, the accuracy of DAXD is very low due to the overflow problem caused by
the truncation and the 2k/k divider.In addition to array dividers, a rounding-based
approximate divider, referred to as SEERAD, has been proposed. To compute A/B, the
divisor B is rounded to a form of 2K+L/D, where , and L and D are constant
integers.Then, the value of A/B is approximated by A × D/2K+L.By arranging D and L, four
accuracy levels are devised for SEERAD. Thus, approximate division is implemented by a
rounding block, a look-up table, a small multiplier, adders and a shifter. Without using a
traditional division structure, SEERAD is fast, but it incurs a substantial power dissipation
and a large area due to the use of the look-up table.
In this paper, a novel approximate unsigned divider using adaptive approximation is

proposed for low-power and high performance operation. In this design, input pruning,
division strategy, and error correction work synergistically to ensure a high accuracy with a
very low maximum error distance.
This paper presents the following novel contributions. Adaptive pruning schemes are
analyzed in detail for four different scenarios of the dividend and divisor. Based on this
analysis, new division strategies are proposed to avoid the possible occurrence of overflow
found in the approximate divider in Finally, an error correction circuit using OR gates is
utilized for achieving a high accuracy at a very small hardware overhead.
Compared with the exact 16/8 array divider, the proposed adaptive approximation-
based divider (denoted as AAXD) using an 8/4 divider achieves a speedup by 60.51%, a
reduction in power dissipation by 65.88% and in area by 38.63%. For a more accurate
configuration using a 12/6 divider, the AAXD is 26.54% faster and 34.13% more power
efficient than the accurate design. Two image processing applications, change detection and
foreground extraction, show that a higher image quality is obtained by using the proposed
design than using other approximate dividers.
ON THE DESIGN OF APPROXIMATE RESTORING DIVIDERS FOR ERROR-

TOLERANT APPLICATIONS.
In this section, the error and power consumption of AXDr and AXDnr are evaluated
and compared at a feature size of PTM 32nm technology . The error distance (ED) and
mean error distance (MED) have been proposed in to evaluate approximate arithmetic
circuits. The ED is defined as the absolute difference value between the accurate and
approximate output values; the MED is defined as the average of ED by providing the set of
input values. The MED Power Product (MPP) has been proposed in to evaluate the trade
off between power and accuracy in approximate arithmetic circuits.
In the simulation, the outputs Q and R of the AXDr and EXDr are simulated
exhaustively for the 8-bit and the 16bit AXD; the 32-bit AXD has been simulated using 1
million randomly generated input patterns. The MEDs of Q and R are obtained by
calculating the average of the output error distances (i.e. the error distance corresponding to
each input combination). The NED is calculated as ratio of the MED over the maximum
possible error distance (i.e.
2𝑁𝑁−1 −1 where N is the bit width). The simulation results for the static and dynamic powers
are presented next. The dynamic power is measured under at a frequency of 250MHz.
To evaluate the trade off between computation accuracy and power (include both
static and dynamic) consumption of the AXDs, the MED power product of the AXDs is
calculated and plotted in Although the power saving of a truncation scheme is larger than that
of a replacement scheme, this is accomplished at the expense of the error; so, the triangle
replacement schemes with AXDr1 have the smallest MED power product. Hence AXDr1 is
very promising for an approximate divider design requiring both high accuracy and low
power consumption. AXD2 is again shown to be the worst design among the proposed
designs regardless of the type of replacement. Compared to a replacement scheme, a
truncation scheme is not suitable when both high accuracy and low power AXD designs are
used.
One of the objectives of approximate computing is to reduce the number of

transistors, thus saving power. However, a reduction in the number of transistors often results
in a larger value for the MED. So, there is a trade off between power and MED. show the
static and dynamic power consumptions of AXD versus replacement depth respectively. All
AXDs consume less power (a decrease by more than 50% at the largest depth) when the
replacement depth increases; this is significantly lower than the power consumed by EXDs
(at a depth d=0). Vertical and horizontal approximations save the largest amount of power;
moreover the dynamic power dissipation of the truncation schemes is slightly larger than for
the replacement schemes when the depth is small. However both static and dynamic powers
of a truncation scheme decreases at a higher rate.The static power dissipations of AXD3 and
AXD1 are nearly the same, but AXD3 has a smaller dynamic power co

"Digit-Recurrence Algorithms For Division and Square Root With Limited Precision Primitives" Literature Survey

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

"Digit-Recurrence Algorithms For Division and Square Root With Limited Precision Primitives" Literature Survey

Uploaded by

Copyright:

Available Formats

”Digit-Recurrence Algorithms for Division and Square Root with Limited Precision

We Reciprocal table TREC: 2-digit input d* , 1 (significant) digit output g in radix-4

TYPES OF ARITHMETIC ALGORITHMS:

+ Reducing number of steps

+ Reducing time of step

+ Overlap steps (concurrency/pipelining)

+ Reduce number of logic levels

+ Number of chips and types of chips

+ Number of gates and types of gates

+ Design cost; verification and testing cost

New Metrics for the Reliability of Approximate and Probabilistic Adders

Adaptive Approximation in Arithmetic Circuits: A Low-Power Unsigned Divider

To save additional hardware, a dynamic approximate divider (DAXD) has been

In this paper, a novel approximate unsigned divider using adaptive approximation is

ON THE DESIGN OF APPROXIMATE RESTORING DIVIDERS FOR ERROR-

One of the objectives of approximate computing is to reduce the number of

You might also like