Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

CSC 258

Pooja Vashisth
CSC 258
Week 5

Numerical Representation

Pooja Vashisth
1. Convert numerical data from one format to another
2. Interact with numbers in a fixed-width 2’s-complement
representation
3. Describe underflow, overflow, round off, and truncation errors
in data representations
4. Arithmetic operations on hardware

3 3
 Numeric data formats
 Binary Arithmetic and overflow
 Negative integers' binary representation
 How representations affect accuracy and precision
 Investigate hardware implementation of various arithmetic
operations
 Addition, subtraction, multiplication, division

4 4
CSC 258

5
x = x n−1 2n−1 + x n−2 2n−2 +  + x1 21 + x 0 20

 Range: 0 to +2n – 1
 Example
 0000 0000 … 0000 10112
= 0 + … + 1×23 + 0×22 +1×21 +1×20
= 0 + … + 8 + 0 + 2 + 1 = 1110
 Using 64 bits: 0 to
+18,446,774,073,709,551,615

6
 Base 8
 Compact representation of bit strings
 3 bits per octal digit
0 000 4 100
1 001 5 101
2 010 6 110
3 011 7 111
 Example: 7650
 111 110 101 000

7
 Base 16
 Compact representation of bit strings
 4 bits per hex digit

0 0000 4 0100 8 1000 c 1100


1 0001 5 0101 9 1001 d 1101
2 0010 6 0110 a 1010 e 1110
3 0011 7 0111 b 1011 f 1111

 Example: eca8 6420


 1110 1100 1010 1000 0110 0100 0010 0000

8
(Practice) Quiz time!

 Convert 5ED4 into a binary number


 Convert 001 111 011 from binary to octal
 Convert 1011 1110 from binary to decimal
 Convert 35 from octal to hexadecimal
 Convert 249 from decimal to binary

9
Quick reference:

10
 Sign-magnitude notation is
the simplest method of
representing positive and
negative numbers.
 Negative numbers have a
‘1’ as sign bit whereas
positive numbers have a
‘0’.

11
Sign-magnitude has some disadvantages.

Disadvantages:
We can have a positive result for zero, +0 or 00002, and a negative
result for zero, -0 or 10002. Both are valid.
 Hence one of the bit patterns, e.g. 10002 is wasted.

 Addition doesn't work the way we want it to. Try adding +1 and -
1: 00012 + 10012

12
x = − x n−1 2n−1 + x n−2 2n−2 +  + x1 21 + x 0 20

 Range: –2n – 1 to +2n – 1 – 1


 Example
 1111 1111 … 1111 11002
= –1×231 + 1×230 + … + 1×22 +0×21 +0×20
= –2,147,483,648 + 2,147,483,644 = –410
 Using 64 bits: −9,223,372,036,854,775,808
to 9,223,372,036,854,775,807

13
 The top bit is still a sign bit.
 1 for negative numbers
 0 for non-negative numbers

 Non-negative numbers have the same unsigned and 2s-


complement representation
 To obtain the magnitude of a negative number, negate each bit
and add one.
 Some specific numbers
 0: 0000 0000 … 0000
 –1: 1111 1111 … 1111
 Most-negative: 1000 0000 … 0000
 Most-positive: 0111 1111 … 1111

14
 Negate and add 1
x + x = 1111...1112 = −1

x + 1 = −x

 Example: representation of -2
 +2 = 0000 0000 … 00102
 –2 = 1111 1111 … 11012 + 1
= 1111 1111 … 11102

15
 Representing a number using more bits
 Preserve the numeric value

 Replicate the sign bit to the left


 c.f. unsigned values: extend with 0s

 Examples: 8-bit to 16-bit


 +2: 0000 0010 => 0000 0000 0000 0010
 –2: 1111 1110 => 1111 1111 1111 1110

 In RISC-V instruction set


 lb: sign-extend loaded byte
 lbu: zero-extend loaded byte
16
(Practice) Quiz time!

 Convert 001 111 011 from 2’s complement to decimal


 Convert 1011 1110 from 2’s complement to decimal

17
CSC 258

18
 Operations on integers
 Addition and subtraction
 Multiplication and division
 Dealing with overflow

 Floating-point real numbers


 Representation and operations

19
Example: 7 + 6

Overflow occurs if the result is too large


 Remember: our data is stored in a fixed
number of bits.

20
Multimedia extensions to standard instruction sets often offer
Saturating Arithmetic to deal with overflow.

21
 Unsigned integers are commonly used for memory
addresses where overflows are ignored.

 Fortunately, the compiler can easily check for unsigned


overflow using a branch instruction.
 e.g., Addition has overflowed if the sum is less than either of
the addends.

22
Arithmetic Adventure!
 We are now going to learn to work out our solutions for
various binary arithmetic problems.

 For each question:


 Perform the calculation
 Identify whether there is an overflow or not
 Determine the value in case saturated arithmetic is applied

 The related exercises are from chapter 3 exercises (3.12);


questions: 3.6 to 3.11

23
Sample Problem Solution
Consider an addition problem with two binary numbers, the sixth bit is used for sign,
and if the result exceeds +3110 or is less than -3210 , that causes an overflow.
Let’s try adding -1710 and -1910 to see how this overflow condition works for
excessive negative numbers:

Since the sum (-3610)


is less than the
allowable limit for our
designated bit field,
we have an overflow
error.

The (incorrect) answer is a positive twenty-eight. The real sum of -1710 and -1910
has too large a magnitude to be properly represented with a five bit magnitude
field and a sixth sign bit. Applying saturated arithmetic the result is: -3210

24
 Start with long-multiplication approach

multiplicand
1000
multiplier
× 1001
1000
0000
0000
1000
product 1001000

Length of product is
the sum of operand
lengths

25
Initially 0

26
A Multiplication Algorithm
Example
Using 4-bit numbers to save space, multiply 2ten × 3ten, or 0010two × 0011two.

Answer

27 The bit examined to determine the next step is circled in color.


 Perform steps in parallel: add/shift

 One cycle per partial-product addition


 That’s ok, if frequency of multiplications is low
28
 Uses multiple adders
 Cost/performance tradeoff

 Can be pipelined
 Several multiplication performed in parallel
29
 Four multiply instructions:
 mul: multiply
 Gives the lower 64 bits of the product
 mulh: multiply high
 Gives the upper 64 bits of the product, assuming the operands are
signed
 mulhu: multiply high unsigned
 Gives the upper 64 bits of the product, assuming the operands are
unsigned
 mulhsu: multiply high signed/unsigned
 Gives the upper 64 bits of the product, assuming one operand is
signed and the other unsigned
 Use mulh result to check for 64-bit overflow

30
 Check for 0 divisor
 Long division approach
quotient
 If divisor ≤ dividend bits
dividend  1 bit in quotient, subtract
1001  Otherwise
1000 1001010  0 bit in quotient, bring down next
-1000 dividend bit
divisor
10
101  Restoring division
1010  Do the subtract, and if remainder goes
-1000 < 0, add divisor back
remainder 10  Signed division
 Divide using absolute values
n-bit operands yield n-bit
 Adjust sign of quotient and remainder
quotient and remainder
as required

31
Initially divisor
in left half

Initially dividend

32
A Divide Algorithm
Example
Using a 4-bit version of the algorithm to save pages, let’s try
dividing 7ten by 2ten, or 0000 0111two by 0010two.

Answer

33 The bit examined to determine the next step is circled in color.


 One cycle per partial-remainder subtraction
 Looks a lot like a multiplier!
 Same hardware can be used for both

34
 Can’t use parallel hardware as in multiplier
 Subtraction is conditional on sign of remainder

 Faster dividers (e.g. SRT division) generate multiple


quotient bits per step
 Still require multiple steps

35
 Four instructions:
 div, rem: signed divide, remainder
 divu, remu: unsigned divide, remainder

 Overflow and division-by-zero don’t produce errors


 Just return defined results
 Faster for the common case of no error

36
CSC 258

37
This material gets into representation of real numbers, and you
should know these terms:

 Overflow / Underflow
 Precision / Accuracy

38
39
 Assume a 32-bit format which reserves 1 bit for the sign, 15 bits
for the integer part and 16 bits for the fractional part.
 Then, -43.625 is represented as following:

1 is used to 000000000101011 is 1010000000000000


represent 15 bit binary value for is 16 bit binary
negative sign decimal 43 value for fractional
0.625

40
For 32-bit positive numbers:

Sign bit 15 bits 16 bits


integer fraction

What are the values of the smallest and the largest 32-bit positive
number for the above format?

41
 The smallest positive number is:
 2-16 ≈ 0.00001525878 (approximate)

 The largest positive number is:


 (215-1) + (1-2-16) = 215 - 2-16 = 32767.99998474121

 Moving the radix point left results in the loss of range


with an increased accuracy.
 Moving the radix point right results in loss of accuracy
with an increased range.

42
 The advantage of using a fixed-point
representation is performance. The hardware for
integers is sufficient.
 The disadvantage is a relatively limited range of
values and precision of values that can be
represented.
 Fixed point is usually inadequate for numerical analysis
as it does not allow enough numbers and accuracy.
 Any number whose representation requires more than
the set number of bits allocated to the integer would
have to be stored inexactly.
 We will learn more in the Gallery Walk

43
We will take a Gallery Walk to discuss fixed point applications in
software
 Goal: Identify problems that may result due to accuracy issues

 Spend 2 minutes in each gallery and pen down your


comments/observations
 Then discuss/share your findings.

44
We’ve presented a fixed point representation of (some)
real numbers

One alternative is floating point.


 In floating point, the number of digits used to
represent the integer and fractional parts may vary.
 These operations tends to require more complex
hardware but allow a wider range of values.

45
 There’s a shift in topics next week after.
 We’re moving away from assembly and to
computer organization.

46
47
• Submit READY? Quizzes before classes
• Submit your group homework1 this week
• Participate in Peer discussion and Q/A every week
• Check your labs schedule… (Lab C)
• Study for the first term test …. All the Best!!!

48
• Submit Homework online this week (week 5)
• Do the practice questions for the week
 #?: #3.1, 3.6, 3.11*, 3.13*

49
 Floating point representation
 Floating point arithmetic

50
51

You might also like