Professional Documents
Culture Documents
Lecture 05
Lecture 05
https://xkcd.com/1953/
1
LEARNING OBJECTIVES
• Text compression
• Huffman encoding
• Run-length encoding
• Lossy compression
2
CHARACTER ENCODING
3
ASCII
4
5
THINGS TO NOTE ABOUT ASCII
• Uppercase letters 'A' to 'Z' are in order with encodings 0x41 to 0x5A
• Lower case letters 'a' to 'z' have encodings 0x61 to 0x7A (hence only one bit
difference between an upper and a lowercase letter)
• ASCII has been superseded, mostly by UTF-8 which keeps the ASCII code for 8-bit
values from 0x00 to 0x7F
6
UTF-8
7
EMOJIS HAVE UTF-8 ENCODINGS TOO
• 📱
• mobile phone
• 🇳🇿
8
COMPRESSION
• https://www.youtube.com/watch?v=JsTptu56GM8
9
HUFFMAN TREE
• This tree from Wikipedia is the Huffman tree for the text "this is
an example of a huffman tree".
Letter Code
e 000
• It can also be represented as a table a 010
space 111
n 0010
t 0110
m 0111
i 1000
h 1010
s 1011
f 1101
o 00110
u 00111
x 10010
p 10011
r 11000
l 11001
10
THE ENCODING
• 0110 1010 1000 1011 111 1000 1011 111 010 0010 111 000 10010
010 0111 10011 11001 000 111 00110 1101 111 010 111 1010
00111 1101 1101 0111 010 0010 111 0110 11000 000 000
• With the example on the previous slide the number of bits to represent the message is 135
and the uncompressed ASCII equivalent (one char per byte) is 36 × 8 = 288. This assumes
the tree or table doesn’t have to be transmitted.
• But if we only had the 16 characters in the message to encode in a fixed length encoding
we could do that with 4 bits per character.
• This would give a compression ratio of (36 x 4) / 135 = 144/135 which is pretty close to not
being compressed at all.
12
COMPRESSION WHY BOTHER?
https://xkcd.com/1381/
13
RUN LENGTH ENCODING (RLE)
• Not really useful for text, but for certain types of graphics.
• Values repeated in a row are counted and we store the value and the count
• Does RLE help with the sentence "this is an example of a huffman tree"?
14
SOUNDS AND IMAGES
15
SOUND WAVE
16
ANALOGUE TO DIGITAL
• Quality is determined by
• Sampling rate: number of samples per second
• More samples ▶ more accurate waveform
• Bit depth: number of bits per sample
• More bits ▶ more accurate amplitude
17
SAMPLING
18
TOO MUCH DATA
• https://www.youtube.com/watch?v=KGZ0een8vSE (1:20)
19
IMAGES
20
DIGITIZED IMAGE
21
USE MORE BITS FOR MORE COLOURS
22
IMAGES AND MOVIES
23
HOW MUCH DATA?
24