Pooja Vashisth

CSC 258
Pooja Vashisth
CSC 258
Week 5
Numerical Representation
Pooja Vashisth
1. Convert numerical data from one format to another
2. Interact with numbers in a fixed-width 2’s-complement
representation
3. Describe underflow, overflow, round off, and truncation errors
in data representations
4. Arithmetic operations on hardware
3 3
 Numeric data formats
 Binary Arithmetic and overflow
 Negative integers' binary representation
 How representations affect accuracy and precision
 Investigate hardware implementation of various arithmetic
operations
 Addition, subtraction, multiplication, division
4 4
CSC 258
5
x = x n−1 2n−1 + x n−2 2n−2 +  + x1 21 + x 0 20
 Range: 0 to +2n – 1
 Example
 0000 0000 … 0000 10112
= 0 + … + 1×23 + 0×22 +1×21 +1×20
= 0 + … + 8 + 0 + 2 + 1 = 1110
 Using 64 bits: 0 to
+18,446,774,073,709,551,615
6
 Base 8
 Compact representation of bit strings
 3 bits per octal digit
0 000 4 100
1 001 5 101
2 010 6 110
3 011 7 111
 Example: 7650
 111 110 101 000
7
 Base 16
 Compact representation of bit strings
 4 bits per hex digit
0 0000 4 0100 8 1000 c 1100

1 0001 5 0101 9 1001 d 1101
2 0010 6 0110 a 1010 e 1110
3 0011 7 0111 b 1011 f 1111
 Example: eca8 6420

 1110 1100 1010 1000 0110 0100 0010 0000
8
(Practice) Quiz time!
 Convert 5ED4 into a binary number

 Convert 001 111 011 from binary to octal
 Convert 1011 1110 from binary to decimal
 Convert 35 from octal to hexadecimal
 Convert 249 from decimal to binary
9
Quick reference:
10
 Sign-magnitude notation is
the simplest method of
representing positive and
negative numbers.
 Negative numbers have a
‘1’ as sign bit whereas
positive numbers have a
‘0’.
11
Sign-magnitude has some disadvantages.
Disadvantages:
We can have a positive result for zero, +0 or 00002, and a negative
result for zero, -0 or 10002. Both are valid.
 Hence one of the bit patterns, e.g. 10002 is wasted.
 Addition doesn't work the way we want it to. Try adding +1 and -
1: 00012 + 10012
12
x = − x n−1 2n−1 + x n−2 2n−2 +  + x1 21 + x 0 20
 Range: –2n – 1 to +2n – 1 – 1

 Example
 1111 1111 … 1111 11002
= –1×231 + 1×230 + … + 1×22 +0×21 +0×20
= –2,147,483,648 + 2,147,483,644 = –410
 Using 64 bits: −9,223,372,036,854,775,808
to 9,223,372,036,854,775,807
13
 The top bit is still a sign bit.
 1 for negative numbers
 0 for non-negative numbers
 Non-negative numbers have the same unsigned and 2s-

complement representation
 To obtain the magnitude of a negative number, negate each bit
and add one.
 Some specific numbers
 0: 0000 0000 … 0000
 –1: 1111 1111 … 1111
 Most-negative: 1000 0000 … 0000
 Most-positive: 0111 1111 … 1111
14
 Negate and add 1
x + x = 1111...1112 = −1
x + 1 = −x
 Example: representation of -2
 +2 = 0000 0000 … 00102
 –2 = 1111 1111 … 11012 + 1
= 1111 1111 … 11102
15
 Representing a number using more bits
 Preserve the numeric value
 Replicate the sign bit to the left

 c.f. unsigned values: extend with 0s
 Examples: 8-bit to 16-bit

 +2: 0000 0010 => 0000 0000 0000 0010
 –2: 1111 1110 => 1111 1111 1111 1110
 In RISC-V instruction set

 lb: sign-extend loaded byte
 lbu: zero-extend loaded byte
16
(Practice) Quiz time!
 Convert 001 111 011 from 2’s complement to decimal

 Convert 1011 1110 from 2’s complement to decimal
17
CSC 258
18
 Operations on integers
 Addition and subtraction
 Multiplication and division
 Dealing with overflow
 Floating-point real numbers

 Representation and operations
19
Example: 7 + 6
Overflow occurs if the result is too large

 Remember: our data is stored in a fixed
number of bits.
20
Multimedia extensions to standard instruction sets often offer
Saturating Arithmetic to deal with overflow.
21
 Unsigned integers are commonly used for memory
addresses where overflows are ignored.
 Fortunately, the compiler can easily check for unsigned

overflow using a branch instruction.
 e.g., Addition has overflowed if the sum is less than either of
the addends.
22
Arithmetic Adventure!
 We are now going to learn to work out our solutions for
various binary arithmetic problems.
 For each question:

 Perform the calculation
 Identify whether there is an overflow or not
 Determine the value in case saturated arithmetic is applied
 The related exercises are from chapter 3 exercises (3.12);

questions: 3.6 to 3.11
23
Sample Problem Solution
Consider an addition problem with two binary numbers, the sixth bit is used for sign,
and if the result exceeds +3110 or is less than -3210 , that causes an overflow.
Let’s try adding -1710 and -1910 to see how this overflow condition works for
excessive negative numbers:
Since the sum (-3610)

is less than the
allowable limit for our
designated bit field,
we have an overflow
error.
The (incorrect) answer is a positive twenty-eight. The real sum of -1710 and -1910
has too large a magnitude to be properly represented with a five bit magnitude
field and a sixth sign bit. Applying saturated arithmetic the result is: -3210
24
 Start with long-multiplication approach
multiplicand
1000
multiplier
× 1001
1000
0000
0000
1000
product 1001000
Length of product is
the sum of operand
lengths
25
Initially 0
26
A Multiplication Algorithm
Example
Using 4-bit numbers to save space, multiply 2ten × 3ten, or 0010two × 0011two.
Answer
27 The bit examined to determine the next step is circled in color.

 Perform steps in parallel: add/shift
 One cycle per partial-product addition

 That’s ok, if frequency of multiplications is low
28
 Uses multiple adders
 Cost/performance tradeoff
 Can be pipelined
 Several multiplication performed in parallel
29
 Four multiply instructions:
 mul: multiply
 Gives the lower 64 bits of the product
 mulh: multiply high
 Gives the upper 64 bits of the product, assuming the operands are
signed
 mulhu: multiply high unsigned
 Gives the upper 64 bits of the product, assuming the operands are
unsigned
 mulhsu: multiply high signed/unsigned
 Gives the upper 64 bits of the product, assuming one operand is
signed and the other unsigned
 Use mulh result to check for 64-bit overflow
30
 Check for 0 divisor
 Long division approach
quotient
 If divisor ≤ dividend bits
dividend  1 bit in quotient, subtract
1001  Otherwise
1000 1001010  0 bit in quotient, bring down next
-1000 dividend bit
divisor
10
101  Restoring division
1010  Do the subtract, and if remainder goes
-1000 < 0, add divisor back
remainder 10  Signed division
 Divide using absolute values
n-bit operands yield n-bit
 Adjust sign of quotient and remainder
quotient and remainder
as required
31
Initially divisor
in left half
Initially dividend
32
A Divide Algorithm
Example
Using a 4-bit version of the algorithm to save pages, let’s try
dividing 7ten by 2ten, or 0000 0111two by 0010two.
Answer
33 The bit examined to determine the next step is circled in color.

 One cycle per partial-remainder subtraction
 Looks a lot like a multiplier!
 Same hardware can be used for both
34
 Can’t use parallel hardware as in multiplier
 Subtraction is conditional on sign of remainder
 Faster dividers (e.g. SRT division) generate multiple

quotient bits per step
 Still require multiple steps
35
 Four instructions:
 div, rem: signed divide, remainder
 divu, remu: unsigned divide, remainder
 Overflow and division-by-zero don’t produce errors

 Just return defined results
 Faster for the common case of no error
36
CSC 258
37
This material gets into representation of real numbers, and you
should know these terms:
 Overflow / Underflow
 Precision / Accuracy
38
39
 Assume a 32-bit format which reserves 1 bit for the sign, 15 bits
for the integer part and 16 bits for the fractional part.
 Then, -43.625 is represented as following:
1 is used to 000000000101011 is 1010000000000000

represent 15 bit binary value for is 16 bit binary
negative sign decimal 43 value for fractional
0.625
40
For 32-bit positive numbers:
Sign bit 15 bits 16 bits

integer fraction
What are the values of the smallest and the largest 32-bit positive
number for the above format?
41
 The smallest positive number is:
 2-16 ≈ 0.00001525878 (approximate)
 The largest positive number is:

 (215-1) + (1-2-16) = 215 - 2-16 = 32767.99998474121
 Moving the radix point left results in the loss of range

with an increased accuracy.
 Moving the radix point right results in loss of accuracy
with an increased range.
42
 The advantage of using a fixed-point
representation is performance. The hardware for
integers is sufficient.
 The disadvantage is a relatively limited range of
values and precision of values that can be
represented.
 Fixed point is usually inadequate for numerical analysis
as it does not allow enough numbers and accuracy.
 Any number whose representation requires more than
the set number of bits allocated to the integer would
have to be stored inexactly.
 We will learn more in the Gallery Walk
43
We will take a Gallery Walk to discuss fixed point applications in
software
 Goal: Identify problems that may result due to accuracy issues
 Spend 2 minutes in each gallery and pen down your

comments/observations
 Then discuss/share your findings.
44
We’ve presented a fixed point representation of (some)
real numbers
One alternative is floating point.

 In floating point, the number of digits used to
represent the integer and fractional parts may vary.
 These operations tends to require more complex
hardware but allow a wider range of values.
45
 There’s a shift in topics next week after.
 We’re moving away from assembly and to
computer organization.
46
47
• Submit READY? Quizzes before classes
• Submit your group homework1 this week
• Participate in Peer discussion and Q/A every week
• Check your labs schedule… (Lab C)
• Study for the first term test …. All the Best!!!
48
• Submit Homework online this week (week 5)
• Do the practice questions for the week
 #?: #3.1, 3.6, 3.11*, 3.13*
49
 Floating point representation
 Floating point arithmetic
50
51

Pooja Vashisth

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pooja Vashisth

Uploaded by

Copyright:

Available Formats

CSC 258

0 0000 4 0100 8 1000 c 1100

 Example: eca8 6420

 Convert 5ED4 into a binary number

 Range: –2n – 1 to +2n – 1 – 1

 Non-negative numbers have the same unsigned and 2s-

 Replicate the sign bit to the left

 Examples: 8-bit to 16-bit

 In RISC-V instruction set

 Convert 001 111 011 from 2’s complement to decimal

 Floating-point real numbers

Overflow occurs if the result is too large

 Fortunately, the compiler can easily check for unsigned

 For each question:

 The related exercises are from chapter 3 exercises (3.12);

Since the sum (-3610)

27 The bit examined to determine the next step is circled in color.

 One cycle per partial-product addition

33 The bit examined to determine the next step is circled in color.

 Faster dividers (e.g. SRT division) generate multiple

 Overflow and division-by-zero don’t produce errors

1 is used to 000000000101011 is 1010000000000000

Sign bit 15 bits 16 bits

 The largest positive number is:

 Moving the radix point left results in the loss of range

 Spend 2 minutes in each gallery and pen down your

One alternative is floating point.

You might also like