Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 10

Ch.5 Fixed-Point vs.

Floating
Point

5.1 Q-format Number Representation


on Fixed-Point DSPs
2s Complement Number
B = bN-1b1b0
Decimal Value D = - bN-1 2N-1 + + b121+ b0
There is a dynamic range limitation.

The Q-format can be used to help prevent


overflow in multiplication.

5.1 Q-format
Q-format or fractional representation
Implied binary point is moved to the left.
F(B)= - bN-1 20 + bN-2 21 ++ b12-(N-2)+ b02 -(N-1)
The programmer keeps track of the binary point.

Example: Q-15
16 bit numbers1 sign bit and 15 fractional bits.
Multiplication of 2 such numbers gives a Q-30
number.
The result can be truncated to keep the most
significant 15 fractional bits, and dropping the
extended sign bitSee Fig. 5.2

Problems with Q Format


There can be precision loss with the QformatFigure 5-5 illustrates the concept
with the Q-12 example.
Addition and subtraction can still be a
problemscaling can be used to help.

6.2 Finite Word Length Effects on


Fixed-Point DSPs
Coefficients in digital filters will be saved in
fixed-point formats in fixed-point DSP
implementations.
The finite word length quantization effect
is similar to input data quantization
introduced by an A/D converter.

5.1 Finite Word Length Effects (p.2)


In IIR filters, the fixed-point representation
of the coefficients can cause the poles to
shift in the z-plane.
The amount of shift due to the quantization
of a single coefficient is influenced by the
positions of all the other poles.
To reduce this effect, IIR filters are often
implemented as a cascade of 2nd order
systems.

5.2 Finite Word Length Effects (p.3)


The frequency response of the implemented
system is also affected by the quantization of
coefficients in the difference equation.
Finally, coefficient quantization can also lead to
limit cycles in IIR filtersthis means that in the
absence of an input, the response of stable
system to a unit impulse could result in
undamped oscillations.

5.3 Floating-Point Number


Representation
C67x processor supports single precision
and double precision floating-point
representations.
The formats are shown in Figure 5.6 and
5.7.

5.4 Overflow and Scaling


Scaling is the simplest correction method for
overflows in fixed-point implementations.
This can be implemented in most filtering and
transform applications.
The input is scaled down for processing and
the output is then scaled back up.
Right shifting (dividing by 2) is an easy way to
implement scaling.
The shifting can occur until the overflows
disappear from the computations.

5.4 Overflow and Scaling (p.2)


Scaling of filter coefficients can also be
used to avoid overflows.
It can be shown that the condition to
prevent overflow is
| h[k] | 1 for k = 0 to N

For IIR filters N is taken large enough so


that the remaining values are negligible.

You might also like