Professional Documents
Culture Documents
Ch.5 Fixed-Point vs. Floating Point
Ch.5 Fixed-Point vs. Floating Point
Floating
Point
5.1 Q-format
Q-format or fractional representation
Implied binary point is moved to the left.
F(B)= - bN-1 20 + bN-2 21 ++ b12-(N-2)+ b02 -(N-1)
The programmer keeps track of the binary point.
Example: Q-15
16 bit numbers1 sign bit and 15 fractional bits.
Multiplication of 2 such numbers gives a Q-30
number.
The result can be truncated to keep the most
significant 15 fractional bits, and dropping the
extended sign bitSee Fig. 5.2