Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

MODULE-1

DATA REPRESENTATION

Binary Number Representation


Variables such as integers can be represent in two ways, i.e., signed and
unsigned. Signed numbers use sign flag or can be distinguish between negative
values and positive values. Whereas unsigned numbers stored only positive
numbers but not negative numbers.
Number representation techniques like: Binary, Octal, Decimal and
Hexadecimal number representation techniques can represent numbers in both
signed and unsigned ways. Binary Number System is one the type of Number
Representation techniques. It is most popular and used in digital systems.
Representation of Binary Numbers:

Binary numbers can be represented in signed and unsigned way. Unsigned


binary numbers do not have sign bit, whereas signed binary numbers uses
signed bit as well or these can be distinguishable between positive and negative
numbers. A signed binary is a specific data type of a signed variable.

1. Unsigned Numbers:

Unsigned numbers don’t have any sign, these can contain only magnitude of the
number. So, representation of unsigned binary numbers are all positive numbers
only. For example, representation of positive decimal numbers are positive by
default. We always assume that there is a positive sign symbol in front of every
number.

Representation of Unsigned Binary Numbers:


Since there is no sign bit in this unsigned binary number, so N bit binary
number represent its magnitude only. Zero (0) is also unsigned number. This
representation has only one zero (0), which is always positive. Every number in
unsigned number representation has only one unique binary equivalent form, so
this is unambiguous representation technique.
Every Computer Programmer must understand Signed and Unsigned Numbers
and its significance. Positive numbers are represented as unsigned numbers. So
we don’t need to use +ve sign in front of them. However, when it comes to
negative numbers we use –ve sign. This shows that number is negative and
different from positive unsigned value. This is why it’s represented
as signed numbers.
In Number System we’ve assumed, we have as many bits as needed to represent
numbers. But in computers, we have a fix number of bits to represent value.
These bit sizes are typically 8-bit, 16-bit, 32-bit, 64-bit. These sizes are usually
multiple of 8, because system memories are organized on an 8-bit byte basis.
When a specific number of bits being used to represent a number. This number
determines the range of possible values that can be represented. For example,
there are 256 possible combinations of 8-bits, therefore an 8-bit number can
represent 256 distinct numeric values and the range is typically considered to
be 0-255. So we can’t represent numbers larger than 255 using 8-bit number.
Similarly, 16 bits allows a range of 0-65535.

2. Signed Numbers:
Signed numbers contain sign flag, this representation distinguish positive and
negative numbers. This technique contains both sign bit and magnitude of a
number. For example, in representation of negative decimal numbers, we need
to put negative symbol in front of given decimal number.
Representation of Signed Binary Numbers:
There are three types of representations for signed binary numbers. Because of
extra signed bit, binary number zero has two representation, either positive (0)
or negative (1), so ambiguous representation. But 2’s complementation
representation is unambiguous representation because of there is no double
representation of number 0. These are: Sign-Magnitude form, 1’s complement
form, and 2’s complement form which are explained as following below.
 Sign Magnitude
Sign magnitude is a very simple representation of negative numbers. In sign
magnitude the first bit is dedicated to represent the sign and hence it is called
sign bit.
Sign bit ‘1’ represents negative sign.
Sign bit ‘0’ represents positive sign.

In sign magnitude representation of a n – bit number, the first bit will represent
sign and rest n-1 bits represent magnitude of number.

For example,
 +25 = 011001
Where 11001 = 25
And 0 for ‘+’
 -25 = 111001
Where 11001 = 25
And 1 for ‘-‘.

 2’s complement method


To represent a negative number in this form, first we need to take the 1’s
complement of the number represented in simple positive binary form and then
add 1 to it.
For example:
(-8)10 = (1000)2
1’s complement of 1000 = 0111
Adding 1 to it, 0111 + 1 = 1000

So, (-8)10 = (1000)2


Please don’t get confused with (8)10 =1000 and (-8)10=1000 as with 4 bits,
we can’t represent a positive number more than 7. So, 1000 is representing -8
only.
Fixed-Point Representation −
This representation has fixed number of bits for integer part and for fractional
part. For example, if given fixed-point representation is IIII.FFFF, then you can
store minimum value is 0000.0001 and maximum value is 9999.9999. There are
three parts of a fixed-point number representation: the sign field, integer field,
and fractional field.

We can represent these numbers using:

 Signed representation: range from -(2(k-1)-1) to (2(k-1)-1), for k bits.


 1’s complement representation: range from -(2(k-1)-1) to (2(k-1)-1), for k
bits.
 2’s complementation representation: range from -(2(k-1)) to (2(k-1)-1), for k
bits.
2’s complementation representation is preferred in computer system because of
unambiguous property and easier for arithmetic operations.
Example −Assume number is using 32-bit format which reserve 1 bit for the
sign, 15 bits for the integer part and 16 bits for the fractional part.
Then, -43.625 is represented as following:
Where, 0 is used to represent + and 1 is used to represent. 000000000101011 is
15 bit binary value for decimal 43 and 1010000000000000 is 16 bit binary
value for fractional 0.625.
The advantage of using a fixed-point representation is performance and
disadvantage is relatively limited range of values that they can represent. So, it
is usually inadequate for numerical analysis as it does not allow enough
numbers and accuracy. A number whose representation exceeds 32 bits would
have to be stored inexactly.

These are above smallest positive number and largest positive number which
can be store in 32-bit representation as given above format. Therefore, the
smallest positive number is 2-16 ≈ 0.000015 approximate and the largest
positive number is (215-1)+(1-2-16)=215(1-2-16) =32768, and gap between these
numbers is 2-16.
We can move the radix point either left or right with the help of only integer
field is 1.

Introduction of Floating Point Representation


There are posts on representation of floating point format. The objective of this
article is to provide a brief introduction to floating point format.
The following description explains terminology and primary details of IEEE 754
binary floating point representation. The discussion confines to single and double
precision formats.
Depending on base and the number of bits used to encode various components,
the IEEE 754 standard defines five basic formats. Among the five formats, the
binary32 and the binary64 formats are single precision and double precision formats
respectively in which the base is 2.
Table – 1 Precision Representation
Precision Base Sign Exponent Significand

Single precision 2 1 8 23+1

Double precision 2 1 11 52+1

Single Precision Format:

As mentioned in Table 1 the single precision format has 23 bits for significand (1
represents implied bit, details below), 8 bits for exponent and 1 bit for sign.
For example, the rational number 9÷2 can be converted to single precision float
format as following,
9(10) ÷ 2(10) = 4.5(10) = 100.1(2)
The result said to be normalized, if it is represented with leading 1 bit, i.e. 1.001 (2) x
22. (Similarly when the number 0.000000001101 (2) x 23 is normalized, it appears as
1.101(2) x 2-6). Omitting this implied 1 on left extreme gives us the mantissa of float
number. A normalized number provides more accuracy than corresponding de-
normalized number. The implied most significant bit can be used to represent even
more accurate significand (23 + 1 = 24 bits) which is
called subnormal representation. The floating point numbers are to be represented
in normalized form.
The subnormal numbers fall into the category of de-normalized numbers. The
subnormal representation slightly reduces the exponent range and can’t be
normalized since that would result in an exponent which doesn’t fit in the field.
Subnormal numbers are less accurate, i.e. they have less room for nonzero bits in the
fraction field, than normalized numbers. Indeed, the accuracy drops as the size of the
subnormal number decreases. However, the subnormal representation is useful in
filing gaps of floating point scale near zero.
In other words, the above result can be written as (-1) 0 x 1.001(2) x 22 which yields the
integer components as s = 0, b = 2, significand (m) = 1.001, mantissa = 001 and e =
2. The corresponding single precision floating number can be represented in binary
as shown below,

Where the exponent field is supposed to be 2, yet encoded as 129 (127+2)


called biased exponent. The exponent field is in plain binary format which also
represents negative exponents with an encoding (like sign magnitude, 1’s
complement, 2’s complement, etc.). The biased exponent is used for the
representation of negative exponents. The biased exponent has advantages over other
negative representations in performing bitwise comparing of two floating point
numbers for equality.
A bias of (2n-1 – 1), where n is # of bits used in exponent, is added to the exponent (e)
to get biased exponent (E). So, the biased exponent (E) of single precision number
can be obtained as
E = e + 127
The range of exponent in single precision format is -128 to +127. Other values are
used for special symbols.
Note: When we unpack a floating point number the exponent obtained is the biased
exponent. Subtracting 127 from the biased exponent we can extract unbiased
exponent.

Double Precision Format:

As mentioned in Table – 1 the double precision format has 52 bits for significand (1
represents implied bit), 11 bits for exponent and 1 bit for sign. All other definitions
are same for double precision format, except for the size of various components.
Precision:
The smallest change that can be represented in floating point representation is called
as precision. The fractional part of a single precision normalized number has exactly
23 bits of resolution, (24 bits with the implied bit). This corresponds to log (10) (223) =
6.924 = 7 (the characteristic of logarithm) decimal digits of accuracy. Similarly, in
case of double precision numbers the precision is log (10) (252) = 15.654 = 16 decimal
digits.

1. To convert the floating point into decimal, we have 3 elements in a 32-bit


floating point representation:
i) Sign
ii) Exponent
iii) Mantissa

 Sign bit is the first bit of the binary representation. ‘1’ implies negative number
and ‘0’ implies positive number.
Example: 11000001110100000000000000000000 This is negative number.
 Exponent is decided by the next 8 bits of binary representation. 127 is the unique
number for 32 bit floating point representation. It is known as bias. It is
determined by 2k-1 -1 where ‘k’ is the number of bits in exponent field.
There are 3 exponent bits in 8-bit representation and 8 exponent bits in 32-bit
representation.
Thus
bias = 3 for 8 bit conversion (2 3-1 -1 = 4-1 = 3)
bias = 127 for 32 bit conversion. (2 8-1 -1 = 128-1 = 127)
 Example: 01000001110100000000000000000000
10000011 = (131)10
131-127 = 4
Hence the exponent of 2 will be 4 i.e. 2 4 = 16.
 Mantissa is calculated from the remaining 23 bits of the binary representation. It
consists of ‘1’ and a fractional part which is determined by:
Example:
01000001110100000000000000000000
The fractional part of mantissa is given by:
1*(1/2) + 0*(1/4) + 1*(1/8) + 0*(1/16) +……… = 0.625
Thus the mantissa will be 1 + 0.625 = 1.625
The decimal number hence given as: Sign*Exponent*Mantissa = (-
1)0*(16)*(1.625) = 26

2. To convert the decimal into floating point, we have 3 elements in a 32-bit


floating point representation:
i) Sign (MSB)
ii) Exponent (8 bits after MSB)
iii) Mantissa (Remaining 23 bits)

 Sign bit is the first bit of the binary representation. ‘1’ implies negative number
and ‘0’ implies positive number.
Example: To convert -17 into 32-bit floating point representation Sign bit = 1
 Exponent is decided by the nearest smaller or equal to 2 n number. For 17, 16 is
the nearest 2n. Hence the exponent of 2 will be 4 since 2 4 = 16. 127 is the unique
number for 32 bit floating point representation. It is known as bias. It is
determined by 2k-1 -1 where ‘k’ is the number of bits in exponent field.
Thus bias = 127 for 32 bit. (2 8-1 -1 = 128-1 = 127)
 Now, 127 + 4 = 131 i.e. 10000011 in binary representation.
 Mantissa: 17 in binary = 10001.
Move the binary point so that there is only one bit from the left. Adjust the
exponent of 2 so that the value does not change. This is normalizing the number.
1.0001 x 24. Now, consider the fractional part and represented as 23 bits by
adding zeros.
00010000000000000000000
Ripple carry adder circuit.

Multiple full adder circuits can be cascaded in parallel to add an N-bit number.
For an N- bit parallel adder, there must be N number of full adder circuits. A
ripple carry adder is a logic circuit in which the carry-out of each full adder is
the carry in of the succeeding next most significant full adder. It is called a
ripple carry adder because each carry bit gets rippled into the next stage. In a
ripple carry adder the sum and carry out bits of any half adder stage is not valid
until the carry in of that stage occurs.Propagation delays inside the logic
circuitry is the reason behind this. Propagation delay is time elapsed between
the application of an input and occurance of the corresponding output. Consider
a NOT gate, When the input is “0” the output will be “1” and vice versa. The
time taken for the NOT gate’s output to become “0” after the application of
logic “1” to the NOT gate’s input is the propagation delay here. Similarly the
carry propagation delay is the time elapsed between the application of the carry
in signal and the occurance of the carry out (Cout) signal. Circuit diagram of a
4-bit ripple carry adder is shown below.

Ripple carry adder


Sum out S0 and carry out Cout of the Full Adder 1 is valid only after the
propagation delay of Full Adder 1. In the same way, Sum out S3 of the Full
Adder 4 is valid only after the joint propagation delays of Full Adder 1 to Full
Adder 4. In simple words, the final result of the ripple carry adder is valid only
after the joint propogation delays of all full adder circuits inside it.

In ripple carry adders, for each adder block, the two bits that are to be added
are available instantly. However, each adder block waits for the carry to arrive
from its previous block. So, it is not possible to generate the sum and carry of
any block until the input carry is known. The block waits for
the block to produce its carry. So there will be a considerable time
delay which is carry propagation delay.

Consider the above 4-bit ripple carry adder. The sum is produced by the
corresponding full adder as soon as the input signals are applied to it. But the
carry input is not available on its final steady state value until
carry is available at its steady state value. Similarly depends
on and on . Therefore, though the carry must propagate to
all the stages in order that output and carry settle their final
steady-state value.
The propagation time is equal to the propagation delay of each adder block,
multiplied by the number of adder blocks in the circuit. For example, if each
full adder stage has a propagation delay of 20 nanoseconds, then will
reach its final correct value after 60 (20 × 3) nanoseconds. The situation gets
worse, if we extend the number of stages for adding more number of bits.
Carry Look-ahead Adder :
A carry look-ahead adder reduces the propagation delay by introducing more
complex hardware. In this design, the ripple carry design is suitably
transformed such that the carry logic over fixed groups of bits of the adder is
reduced to two-level logic. Let us discuss the design in detail.

Consider the full adder circuit shown above with corresponding truth table.
We define two variables as ‘carry generate’ and ‘carry
propagate’ then,

From the above Boolean equations we can observe that C4 does not have to
wait for C3 and C2 to propagate but actually C4 is propagated at the same time
as C3 and C2. Since the Boolean expression for each carry output is the sum of
products so these can be implemented with one level of AND gates followed
by an OR gate.
The implementation of three Boolean functions for each carry output (C2, C3
and C4) for a carry look-ahead carry generator shown in below figure.
Time Complexity Analysis :

Advantages and Disadvantages of Carry Look-Ahead Adder :


Advantages –

 The propagation delay is reduced.


 It provides the fastest addition logic.
Disadvantages –

 The Carry Look-ahead adder circuit gets complicated as the number of


variables increase.
 The circuit is costlier as it involves more number of hardware.

You might also like