Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 64

3) Data Representation

Last update: 2020 September 21 Dr. E. Lang


© E. Lang

1
Table of Contents
3.1 Data Types
3.2 Integers
3.3 Integers—Two’s Complement
3.4 Floating-Point Numbers
3.5 Logical Data
3.6 Character Data
3.7 ASCII Editors
3.7 Summary

2
3.1) Data Types
Recall the basic data types:

1) integers
2) floating-point (real) numbers
3) logical (Boolean) data
4) character data
5) addresses (pointers)

We can also have data structures, or complex data, which are ordered
set of the basic data types, such as arrays of the same data type. And we
can have complex data that are ordered set of mixed data types.

3
3.1) Data Types
But in MATLAB most data types are represented as arrays. This is
not standard for programming languages, but MATLAB is not a
programming language, MATLAB is a mathematical package with some
programming features.
The following data types are arrays of primitive data in MATLAB:

• numeric (several types)


• logical
• char (character)

4
3.1) Data Types
The following are the data structures that are arrays of mixed data
types in MATLAB:

• table
• cell
• struct

And there is one scalar data type in MATLAB:

• function handles

5
3.1) Data Types

6
3.1) Data Types
MATLAB means MATrix LABoratory. So in MATLAB everything is an
array except for function handles. In MATLAB the data types are classes
(a concept from C++).
So there are 16 fundamental classes:

• Integers (uint8, uint16, uint32, uint64, int8, int16, int32, int64)


• Floating-point numbers (single, double)
• Logical
• Character (char)
• Structures (table, cell, struct)
• Function handles (address reference to functions in memory)

7
3.1) Data Types
In this course we will cover:

• numeric (all of the type types)


• logical
• char
• function handles

Again note that, with the exception of the function handle, all these are arrays in MATLAB.
So for the following, x is classed as a 1x1 matrix in the Workspace Window:

x = 5.2;

BUT, even though x is a matrix, MATLAB treats it mathematically as a scalar; in this case only.

8
3.1) Data Types
The smallest addressable unit of a computer is the byte:

1 byte = 8 bits

So the various types of computer data are going to be multiples of a byte. For
example, the MATLAB data type int8 is an 8 bit (signed) integer.

One important thing to understand about computer data is that no matter how
may bytes are used, all computer data is finite. This is the primary limit on data
manipulation.

In addition, for a given data type all the bits are uses. So, for example, a zero
represented as a 64-bit integer takes up 64 bits in memory; in this case 64 zeroes.

9
3.2) Integers
The standard size of integer data for most computers is

4 bytes = 32 bits

However most programming languages allow alternate sizes usually referred to


as “long” or “short” depending whether they are greater than or less than 32
bits. Long integers allow larger numbers to be used, if needed, at the cost of
more memory; short integers restrict the size of the numbers but have the
advantage of using less memory.
In addition, integers may be

• Signed: positive and negative number


• Unsigned: positive numbers only
10
3.2) Integers
The type of integers allowed on a computer will be system and software
dependent.
In MATLAB numeric data is always by default double precision, floating-
point numbers.
So to use integers in MATLAB, the data must be explicitly declared as
integer. There are 8 different types of integer data in MATLAB as given in the
following table.
In programming languages such as C, C++ and Java, the programmer must
declare the variable type, before it can be used. For example, the following
declares the variable named count as a 4-byte (32 bit), signed integer in C++:

int count;

11
3.2) Integers
Integer Type Size MATLAB
Short unsigned integer 8 bit unsigned uint8
Short unsigned integer 16 bit unsigned uint16
Unsigned integer 32 bit unsigned uint32
Long unsigned integer 64 bit unsigned uint64
Short signed integer 8 bit signed int8
Short signed Integer 16 bit signed int16
Integer 32 bit signed int32
Long signed integer 64 bit signed int64

12
3.2) Integers
Since numeric data in MATLAB is double precision by default, you must
explicitly declare a value to be an integer. For example to set x to “10” as an 8-
bit signed integer, you use
x = int8(10)

Again you must be aware of the fact that any type of integer will have a
minimum and maximum value. The following table give the minimum and
maximums for integers. You should note that this table is not restricted to
MATLAB.

13
3.2) Integers
Minimum Maximum
uint8 0 255
uint16 0 65,535
uint32 0 4,294,967,295
uint64 0 18,446,744,073,709,551,615
int8 -128 127
int16 -32,768 32,767
int32 -2,147,483,648 2,147,483,647
int64 -9,223,372,036,854,775,808 9,223,372,036,854,775,807

14
3.2) Integers
There are MATLAB functions to obtain these values for integers. For 32-bit integers:
intmax
intmim
or
intmax(ˈint32ˈ)
intmim(ˈint32ˈ)

For other types, just substitute the appropriate character array function argument. For
example,

intmax(ˈint64ˈ)
intmim(ˈint64ˈ)

15
3.2) Integers
•   From statistics, we have the following theorem:
Theorem (Fundamental Principle of Counting):
Given a sequence of m events that can be performed as follows:

First event: n1 ways


Second event: n2 ways

mth event: nm ways

then the number of ways the sequence can be performed, N, is:


16
3.2) Integers
Examples: Fundamental Principle of Counting

17
3.2) Integers
As we have seen in the examples, for n-bits there are 2n possible bit patterns.
So for n = 8, the are 28 = 256 possible bit patterns
For an unsigned integers we have a maximum positive value of 2n – 1 with 0000
… 0000 as the extra bit pattern representing zero.
So for n = 8, the maximum positive number is

28 – 1 = 256 – 1 = 255

and 0000 0000 would be zero.

18
3.2) Integers
For unsigned integers, the decimal value is determined by the same method we
used in the previous slide set.
For example, for a 8-bit, unsigned integer we have:

0000 0110 = 1x22 + 1x21 + 0x20 = 4 + 2 + 0 = 6

19
3.2) Integers
Examples: Unsigned Integers

20
3.3) Integers
For signed integer, the positive and negative numbers must be distributed
between the 2n numbers. The distribution is as follows:

• 2n-1 – 1 positive integers


• 2n-1 negative integers
• 0000 … 000 for zero

So for n = 8,
-28-1 = -27 = -128
28-1 = 27 – 1 = 128 – 1 = 127

Note that there is one more negative number that positive numbers. This
distribution is based on the two’s complement representation.
21
3.3) Integers—Two’s Complement
Representing unsigned (positive) integers is straightforward. They
are the positive number represented by the previously covered binary-
to-decimal and decimal-to-binary conversions.
But computer representation of signed numbers (positive and
negative) is more involved. There are several way to represent integers
for computer manipulation, but the most popular is the two’s
complement method.
The two’s complements standard is hardwired into the computer’s
processor, that is, the integer register.

22
3.3) Integers—Two’s Complement
Two’s Complement Algorithm
To obtain the two’s complement of a binary number with a specified
number of binary digits:

1) Reverse all bits (0  1 and 1  0)


2) Add 1

Note: it is critical to specify the number of bits in the number first and
not deviate from that number when applying the algorithm.

23
3.3) Integers—Two’s Complement
The two’s complement of a positive number maps to the negative of
that number; the two’s complement of a negative number maps to the
positive of that number. So they cycle back and forth, whit two
exceptions:

1) The two’s complement of 0 maps to itself.


2) The two’s complement of the smallest negative number maps to
itself (remember there is one more negative numbers than positive
numbers).

24
3.3) Integers—Two’s Complement
Examples: 4-bit, Two’s Complement

Note: To make the calculations easier we will consider 4-bit integers.


Recall that for 4-bits, we have 24 = 16 possible bit patterns.

25
3.3) Integers—Two’s Complement
So to summarize, we have the following table which gives the
complete set of values for 4-bit, two’s complement integers (note that
negative numbers all have a 1 in the left most bit and positive a 0):

Decimal Binary Decimal Binary


-8 1000 0 0000
-7 1001 1 0001
-6 1010 2 0010
-5 1011 3 0011
-4 1100 4 0100
-3 1101 5 0101
-2 1110 6 0110
-1 1111 7 0111
26
3.3) Integers—Two’s Complement
Examples: Adding two’s complement numbers

27
3.3) Integers—Two’s Complement
So adding integers that would take the calculation outside the set of
valid integers will just cycle through the integer:

(+1) + (+7) = -8

And in fact, unsigned integers also behave as a cycle. So for 4-bit,


unsigned integers:

(+1) + (+15) = 0001 + 1111 = 0000

28
3.3) Integers—Two’s Complement
Integer Overflow
The computer works with a fixed number of bits. When an arithmetic
operation goes outside of the range that can be represented (greater
than or less than the range), the result is integer overflow.

Note: if integer overflow occurs, the computer will send no warning


message, the calculation will proceed—in error.

29
3.3) Integers—Two’s Complement
Integer overflow that results in the calculation cycling through the number
is the standard behaviour for programming languages, such as C++.
But MATLAB behaves differently. MATLAB will just set the result to the
highest or lowest number.
If integer overflow occurs, the computer will not warn you—you, the
programmer, are responsible to make sure this is not an issue in your programs.
Example: For 8-bit, signed integers we have the following:
C++ (short int):
127 + 1 = -128 (no warning)
MATLAB (int8):
127 + 1 = 127 (no warning)

30
3.4) Floating-Point Numbers
The advantages of integers is that they are exact. But most technical
calculations are done in the real number system. Floating-point
numbers (FPN) are used to approximate the real number system.
There are two commonly used floating-point number types:

• Signal precision—32 bits (4 bytes)


• Double precision—64 bits (8 bytes)

For MATLAB the default data type is 64 bit double precision.

31
3.4) Floating-Point Numbers
The most common way to represent floating-point number is the
IEEE 754-2008 standard (revised from IEEE 754-1985). This standard is
“hardwired” into the processor just as the 2s-complement standard is
“hardwired” into the processor.

Bits Sign Fraction Exponent

Single Precision 32 1 bit 23 bits 8 bits

Double 64 1 bit 52 bits 11 bits


Precision

32
3.4) Floating-Point Numbers
The range of IEEE 754 floating-point number is given in the following
table:

Bits Approximate Approximate Significant Decimal


Largest Smallest Digits

Single Precision 32 ±3.4×1038 ±1.18×10−38 7

Double Precision 64 ±1.80×10308 ±2.23×10−308 15

33
3.4) Floating-Point Numbers
There are MATLAB functions to obtain these values. For double precision:

realmax
realmin
or
realmax(ˈdoubleˈ)
realmin(ˈdoubleˈ)

For single precision:


realmax(ˈsingleˈ)
realmin(ˈsingleˈ)

34
3.4) Floating-Point Numbers
As with integer, MATLAB allows specific declaration of a variable to be
32-bit, floating-point using

x = single(5)

It is also possible to convert a value that is not double precision to double


precision. For example, to convert the single precision x to a double
precision number use

y = double(x)

35
3.4) Floating-Point Numbers
As with integers, there are limitation on the data that can be represented in the
floating-point systems.
Recall that for n bits, we can have 2n bit patterns. This mean that for 32-bits we
have

232 = 4,294,967,296 (about 4x109 = 4 G)

possible numbers. But for 64-bit double precision we have

264 = 18,446,744,073,709,551,615 (about 18x1018 = 18 E)


(E = exa = 1018)

possible numbers.
36
3.4) Floating-Point Numbers
Calculations are normally done in the infinite real number system.
For example, calculus is based on the real number system. The real
number system is continuous, that is, it has no “holes” and is therefore
considered a complete number system.
The infinite rational number system is a subset of the real number
system: the real number minus the irrational numbers. Rational
numbers are not complete; there are “holes” where the missing
irrational numbers would be.
32-bit or 64-bit FPN are really a subset of the rational numbers.
Because FPN are limited to 32 or 64 bits, they form a finite and not an
infinite set of numbers. So a floating-point number system has “holes”
where the missing irrational numbers should be and “holes” where the
missing rational numbers should be.

37
3.4) Floating-Point Numbers

Floating-point numbers are an approximation of


the real number system. This causes problems
with round-off errors.

Integers do not have round-off errors.

38
3.4) Floating-Point Numbers
Round-off is a loss of precision, which is caused by using a finite number
of digits, n, to represent a number the is infinite or finite but with more digits
n.
As a simple example, consider a decimal number system that only allows
4 digits after the decimal point. So we have

1/3 = 0.3333
therefore
3x0.3333 = 0.9999

Of course the correct value should be 1. The difference is the round-off error
(in this case 0.000 1).

39
3.4) Floating-Point Numbers
Round-off is a significant problem in single-precision calculations;
therefore, it is best to use double precision.
In past computer processors had 32-bit floating-point registers. As a
result single-precision calculations were done with the hardware. For a
64-bit calculations it was necessary to use software to manipulate two
32-bits registers.
Calculations using hardware are fast; calculations using software are
slow. Therefore with 32-bit processors, double precision calculations
were slow and only done if absolutely necessary.
However, today 64-bit registers are the standard, so double
precision calculations are now done using hardware and are therefore
fast. So to reduce round-off error use double precision calculation,
unless space is an issue. MATLAB’s default is double precision.
40
3.4) Floating-Point Numbers
As mentioned above, there are gaps (“holes”) in the FPN system,
that is, given any number there will be a gap to the next number in the
FPN. In an effort to measure round-off error it is useful to have a
measure of this gap. Unfortunately, this gap is not constant since it
increases in size with the size of the number. To create a standardized
measure of the size of this gap, the machine epsilon is defined.

Definition (Machine Epsilon): The machine epsilon is the gap from the
FPN 1.0 to the next FPN, so it is the smallest number that when added
to one (1) will give a result different from one (1). This definition only
applies to FPNs.

41
3.4) Floating-Point Numbers
MATLAB has a function that returns the machine epsilon for 64-bit
and 32-bit FPN:

eps
eps(ˈsingleˈ)

In addition, the eps function will also return the gap from any number:

esp(x)
eps(single(x))

42
3.4) Floating-Point Numbers
The IEEE-754 standard specifies certain special computational
situations:

Definition (Overflow): In terms of exponents, there are largest negative


and positive numbers. Overflow occurs if a calculation takes a number
beyond those largest numbers. If an overflow calculation occurs, IEEE-
754 specifies that the floating-point register be set to a reserved bit
pattern and the following will be display:

Inf or –Inf

43
3.4) Floating-Point Numbers
Definition (Underflow): There is a smallest number after 0 and a gap
between this number and 0. Underflow occurs if a calculation results in
a number inside this gap.
There are different way to handle underflow. MATLAB will set the
result to zero (0), but there will be no warning that this has occurs.

44
3.4) Floating-Point Numbers
Floating Point Number System

45
3.4) Floating-Point Numbers
Definition (Not a Number, NaN): Certain calculation are indeterminate.
For example,

0/0 ∞-∞

If an indeterminate calculation occurs, IEEE-754 specifies that the


floating-point register be set to a reserved bit pattern and the following
will be display:
NaN

46
3.4) Floating-Point Numbers
Examples:

47
3.5) Logical Data
Logical (Boolean) data has only two values: true and false. The
computer will represent these as 1 and 0 and uses 8-bit integers to store
the value.
Logical data:

• true (1)
• false (0)

48
3.5) Logical Data
Although logical data is represented as

1 or true
0 or false

In fact the computer will interpret 0 as false and any non-zero number
as true.

49
3.6) Character Data
Character data is the computer coding of various symbols. MATLAB has two methods
to deal with character data:

• Character Arrays
• Strings

Characters arrays are enclosed in single quotes, such as ˈaˈ and ˈJohnˈ; strings are
enclosed in double quotes ̎ a and ̎John̎ .
Characters in a character array can be reference as the standard array method. But
strings must be process using MATLAB’s string functions. Strings have much more storage
overhead than character arrays.
Note that in C++, a single character is enclosed in single quotes, but multiple character,
that is strings, are enclosed in double quotes: ˈaˈ and ̎ John ̎. This is an example of
syntax difference between MATLAB and C++.
50
3.5) Character Data
There are several different standards for coding character data, the
most common is ASCII (American Standard Code for Information
Interchange). This is also referred to as text data.
Recall that 1 byte = 8 bits. ASCII codes each character with 1 byte.
However, only the first 7 bits are used; the eighth bit is called the parity
bit and is used for error detection during transmission.
Since only 7 bits are used, there are 27 = 128 possible ASCII
characters. In decimal numbers these would range from 0 to 127.

51
3.5) Character Data
The following tables give the ASCII computer coding. Note the following:

• 00010 is the null character, which is just a place holder in memory with nothing there
• 03210 is the space (blank) character which represents a space and is not the same as
the null character
• 00110 to 03110 and 12710 are control characters; for example, controlling the
movement of the cursor (eg. LF–Line Feed is 01010)
• 02710 is the ESC (escape) character; it is used with a second character to indicated
that the meaning following character has been modified, i.e., the second character
“escapes” its usual meaning
• 00310 is the ETX (end of transmission) used to terminate a operation; it is the control-
C character
• The rest represent symbols (eg. ' is 03910)

52
Dec Hex Key Symbol Dec Hex Key Symbol Dec Hex Key Symbol
000 00 [Ctrl] @ NULL 011 0B [Ctrl] K VT 022 16 [Ctrl] V SYN
001 01 [Ctrl] A SOH 012 0C [Ctrl] L FF 023 17 [Ctrl] W ETB
002 02 [Ctrl] B STX 013 0D [Ctrl] M CR 024 18 [Ctrl] X CAN
003 03 [Ctrl] C ETX 014 0E [Ctrl] N SO 025 19 [Ctrl] Y EM
004 04 [Ctrl] D EOT 015 0F [Ctrl] O SI 026 1A [Ctrl] Z SUB
005 05 [Ctrl] E ENQ 016 10 [Ctrl] P DLE 027 1B [Ctrl] [ ESC
006 06 [Ctrl] F ACK 017 11 [Ctrl] Q DC1 028 1C [Ctrl] \ FS
007 07 [Ctrl] G BEL 018 12 [Ctrl] R DC2 029 1D [Ctrl] ] GS
008 08 [Ctrl] H BS 019 13 [Ctrl] S DC3 030 1E [Ctrl] ^ RS
009 09 [Ctrl] I HT 020 14 [Ctrl] T DC4 031 1F [Ctrl] _ US
010 0A [Ctrl] J LF 021 15 [Ctrl] U NAK

53
Dec Hex Symbol Dec Hex Symbol Dec Hex Symbol
032 20 (space) 043 2B + 054 36 6
033 21 ! 044 2C , 055 37 7
034 22 " 045 2D - 056 38 8
035 23 # 046 2E . 057 39 9
036 24 $ 047 2F / 058 3A :
037 25 % 048 30 0 059 3B ;
038 26 & 049 31 1 060 3C <
039 27 ' 050 32 2 061 3D =
040 28 ( 051 33 3 062 3E >
041 29 ) 052 34 4 063 3F ?
042 2A * 053 35 5 064 40 @

54
Dec Hex Symbol Dec Hex Symbol Dec Hex Symbol

065 41 A 76 4C L 087 57 W
066 42 B 77 4D M 088 58 X
067 43 C 78 4E N 089 59 Y
068 44 D 79 4F O 090 5A Z
069 45 E 80 50 P 091 5B [
070 46 F 81 51 Q 092 5C \
071 47 G 82 52 R 093 5D ]
072 48 H 83 53 S 094 5E ^
073 49 I 84 54 T 095 5F -
074 4A J 85 55 U 096 60 `
075 4B K 86 56 V

55
Dec Hex Symbol Dec Hex Symbol Dec Hex Symbol

097 61 a 108 6C l 119 77 w


098 62 b 109 6D m 120 78 x
099 63 c 110 6E n 121 79 y
100 64 d 111 6F o 122 7A z
101 65 e 112 70 p 123 7B {
102 66 f 113 71 q 124 7C |
103 67 g 114 72 r 125 7D }
104 68 h 115 73 s 126 7E ~
105 69 i 116 74 t 127 7F DEL
106 6A j 117 75 u
107 6B k 118 76 v

56
3.5) Character Data
To the computer these symbols do not exits, they are just a stings of
binary digits. So for example,

ˈAˈ  4C hex = 0100 1100


ˈaˈ  61 hex = 0110 0001

When the computer compares characters it is comparing the binary code.


So if you place ˈaaˈ and ˈAaˈ in alphabetic order, you would get:

Aa (0100 1100 0110 0001)


aa (0110 0001 0110 0001)

57
3.5) Character Data
The standard ASCII set is limited by the number of characters that can be
represented. There are other character codes that can be used that allow more
characters to be defined.
The Unicode (Unique, Universal, and Uniform character encoding) standard was
developed for the internet and to represent characters in languages other than English.
Unicode uses UTF (Unicode Transformation Format) encodings. The advantage of
these character codes is that they allow more characters; the disadvantages is that
they require more storage space.

• ASCII: (7 bits with 27 = 128 characters)


• ASCII extended: (8 bits with 28 = 256 characters)
• UTF-8: (8 bits with 28 = 256 characters)
• UTF-16: (16 bits with 216 = 65,536 characters)
• UTF-32: (32 bits with 232 = 4,294,967,296 characters)

58
3.5) Character Data
MS-Windows uses the character code:

Windows-1252 or CP-1252

This is an 8-bit (1 byte) encoding that is the default encoding of


MATLAB. The code is 2 byes even though only 1 byte is used.

59
3.7) ASCII Editors
When compilers and interpreters translate computer programs to the machine
language instructions that will run on the computer, they expect to see ASCII code. In
fact, computer programs are written with a subset of the ASCII characters. Therefore it
is important to always use an ASCII editor.

Computer programs are written in ASCII (aka text) data.

MATLAB has a built in ASCII editor. Notepad is also an ASCII editor. MS-Word is not.

Never, never ever use a word processor to write computer


programs. Word processors create binary files, not ASCII (text)
files. They have special formatting codes (called metadata) that
are non-ASCII. The complier will try to interpret these as ASCII
and the program will not compile.
60
3.7) Summary
We have discussed the various basic data types and how they are
represented in the computer:

• integers
• real numbers
• logical data
• character data

Each of these data types has specific uses but they are also constrained
by limitation.

61
3.6) Summary
Integers are used mainly for counting in the repetition (looping)
structures. Since integers are exact, there is no round-off problem. Never
use fractional data in a repetition structure because round-off errors can
give inconsistent results. Always use whole numbers.

Real numbers are used for engineering and scientific calculations.

Logical data is used in the control structures to make decisions.

Character data is used to manipulate non-numeric data.

62
3.6) Summary
Recall from the first lecture that “1” is represented in all data types;
however, to the computer the binary code of each of these is
completely different and is therefore manipulated in different ways.
• integers (1 is a two’s complement integer)
• real numbers (1.0 is an IEEE 754-2008 number)
• logical data (1 is an integer 1 representing true)
• character data (ˈ1ˈ is the ASCII code 3116 = 4910)

63
3.6) Summary
What you are expected to know:

• Data types
• Fundamental Theorem of Counting
• Integer representation and two’s complement
• Working and adding two’s complement integers
• Floating-Point number systems and special representations
• ASCII character representation and conversions between symbols and
hex data
64

You might also like