Cmput204 Week 7 QuickSort SLB BST HashTable Handout

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 100

Faculty of Science

Department of Computing Science

CMPUT 204
ALGORITHMS I
Winter 2023

Instructors: Jia You (B1) and Xiaoqi Tan (B2)


Lecture slides by Xiaoqi Tan (https://xiaoqitan.org). Please direct your comments and suggestions to
xiaoqi.tan@ualberta.ca

Week 7
QuickSort; Sorting Lower Bound
BST & Balanced BST; Hash Table

QuickSort
Introduction; Examples
QuickSort: Divide-and-Conquer

1 4 2 5 8 6 7 9
Divide: Partition the array into two subarrays
≤ pivot pivot ≥ pivot
around pivot s.t. elements in lower subarray
≤ pivot ≤ elements in upper subarray

Conquer: Recursively sort the two subarrays ≤ pivot ≥ ≤ pivot ≥

Subarray1 Subarray2

Combine: Trivial (automatical)

4
QuickSort: Pseudo Code
(A, p, r)
pivot

A[p] A[r]
A[q]

pivot

≤ pivot ≥ pivot

(A, p, q − 1) (A, q + 1,r)


This is
the key (A, p, q − 1) (A, p, q − 1)
𝚀
𝚀
𝚀
𝚞
𝚞
𝚞
𝚒
𝚒
𝚒
𝚌
𝚌
𝚌
𝚔
𝚔
𝚔
𝚜
𝚜
𝚜
𝚘
𝚘
𝚘
𝚛
𝚛
𝚛
𝚝
𝚝
𝚝
5
𝙿
𝙿
𝚊
𝚊
𝚛
𝚛
𝚝
𝚝
𝚒
𝚒
𝚝
𝚝
𝚒
𝚒
𝚘
𝚘
𝚗
𝚗
(A, p, r): Pseudo Code

This is our choice of pivot


𝙿
𝚊
𝚛
𝚝
𝚒
𝚝
𝚒
𝚘
𝚗
6
pivot
3 1 7 6 4 8 2 5 i=0 j=1
p = 1,r = 8
3 1 7 6 4 8 2 5 i=1 j=2
3 1 7 6 4 8 2 5 i=2 j=3
3 1 7 6 4 8 2 5 i=2 j=4
=5
3 1 7 6 4 8 2 5 i=2 j=5
At the end of
iteration j = 5

3 1 4 6 7 8 2 5 i=3 j=5
3 1 4 6 7 8 2 5 i=3 j=6
3 1 4 6 7 8 2 5 i=3 j=7
At the end of
iteration j = 7

3 1 4 2 7 8 6 5 i=4 j=7
3 1 4 2 5 8 6 7 i=4 j=8 A[i + 1] A[r]
7
p=1 q=5

3 1 4 2 5 8 6 7 (A, p, q − 1)

pivot

3 1 4 2 5 8 6 7 i=0 j=1 =2

3 1 4 2 5 8 6 7 i=0 j=2
At the end of
iteration j = 2

1 3 4 2 5 8 6 7 i=1 j=2
1 3 4 2 5 8 6 7 i=1 j=3
1 2 4 3 5 8 6 7 i=1 j=4
A[i + 1] A[r]
𝚀
𝚞
𝚒
𝚌
𝚔
𝚜
𝚘
𝚛
𝚝
8
q=5 r=8

1 2 4 3 5 8 6 7 (A, q + 1,r)

pivot
1 2 4 3 5 8 6 7 i=5 j=6 =7

1 2 4 3 5 8 6 7 i=5 j=7
At the end of
iteration j = 7

1 2 4 3 5 6 8 7 i=6 j=7

1 2 4 3 5 6 7 8 i=6 j=8

A[i + 1] A[r]
𝚀
𝚞
𝚒
𝚌
𝚔
𝚜
𝚘
𝚛
𝚝
9
(A, p, r): Correctness (Brief)
p = 1,r = 8
Loop invariant (the for-loop in (A, p, r))
𝙿
𝚊
𝚛
𝚝
𝚒
𝚝
𝚒
𝚘
𝚗
10
𝙿
𝚊
𝚛
𝚝
𝚒
𝚝
𝚒
𝚘
𝚗
pivot
1⃣
Input 3 1 7 6 4 8 2 5
pivot pivot
2⃣ 3 1 4 2 5 8 6 7
3⃣ pivot

1 2 4 3 6 7 8
4⃣
6⃣
Output 1 2 3 4 5 6 7 8
5⃣
11
QuickSort Analysis
A[p] pivot A[r]

≤ pivot ≥ pivot

12
QuickSort Recursion Tree

A[p] pivot A[r]

4 6 9

≤ pivot ≥ pivot

13
QuickSort Recursion Tree: Example
pivot

3 6 9

1 2 3 4 5 7 8 9 10 11

1 2 3 4 5 6 7 8 9 10 11

14
QuickSort Recursion Tree: BST

BST

15
QuickSort: RT Analysis
How do we estimate
what n1 is going to be?

pivot

n1 n − n1 − 1

0 ≤ n1 ≤ n − 1 (A, p, q − 1) (A, q + 1,r)


𝚀
𝚀
𝚞
𝚞
𝚒
𝚒
𝚌
𝚌
𝚔
𝚔
𝚜
𝚜
𝚘
𝚘
𝚛
𝚛
𝚝
𝚝
16
QuickSort: Worst Case (WC)

pivot

n1 n − n1 − 1

(A, p, q − 1) (A, q + 1,r)

17
𝚀
𝚀
𝚞
𝚞
𝚒
𝚒
𝚌
𝚌
𝚔
𝚔
𝚜
𝚜
𝚘
𝚘
𝚛
𝚛
𝚝
𝚝
QuickSort: WC Running Time

What is the
WC instance?
2
Θ(n )

0, if  n = 1,
{T(n − 1) + n − 1, if  n ≥ 2,
a = 1, b = 1, f(n) = n − 1 T(n) =
In Master Theorem, we require b > 1

18
QuickSort: Three WC Instances
Case-1: Ascending order Case-2: Descending order

1 2 3 4 5 6 7 8 8 7 6 5 4 3 2 1

1 2 3 4 5 6 7 8 1 7 6 5 4 3 2 8

Case-3 is a special
4 4 4 4 4 4 4 4
case of the rst two
Case-3: Identical elements
19
fi

QuickSort: Almost WC

pivot

Single
element

Single
element
pivot pivot

n−2 1 1 n−1
20
QuickSort: Almost WC Running Time
2
Θ(n )
pivot

Single
element

Single
element
pivot pivot

n−2 1 1 n−1
21
QuickSort: Best Case (BC)
6

3 6 9
The tree is
as balanced
as possible
1 2 3 4 5 7 8 9 10 11

1 2 3 4 5 6 7 8 9 10 11

22
QuickSort: BC Running Time

Can be estimated as
T(n) = 2T(n/2) + Θ(n) (CLRS pp.175)
by Master Theorem, solves to

Θ(n log n)

23

QuickSort: Almost BC
This is not
uncommon!!

pivot

αn (1 − α)n

Θ(n log n)

24

cn cn

cn log10 n cn/10 9cn/10 cn

cn/100 9cn/100 9cn/100 81cn/100 cn cn log 109 n + O(n)

Θ(1) ⋮
cn log10 n ≤ T(n) ≤ cn log 109 n + O(n) Θ(n log n)
1 1
T(n) = T(αn) + T((1 − α)n) + cn
1
(The tree illustrates the case when α = )
α 1−α 10
QuickSort: Average Case (AC)

26
QuickSort: AC Running Time

n1 can be 0,1,⋯, n − 1
with equal probability

Recurrence
relations

O(n log n)
27

QuickSort: Space Requirement

In-place sorting:
Θ(1) extra space

MergeSort: We need to create a new list


result with size n, i.e., Θ(n) extra space
28
Sorting: Time and Space

In-place: Θ(1)
Θ(n log n) Space: Θ(n)
Θ(n log n) In-place: Θ(1)
In-place: Θ(1)
Good AC
pi v o t

The BC is not QuickSort can be further


rare in practice performance optimized, e.g., a random
αn (1 − α)n 29
choice of pivot
Worst case Normal cases

1 2 3 4 5 6 7 8 pivot

1 2 3 4 5 6 7 8 αn (1 − α)n

Don’t bet on any


ONE of them ONLY!!

Best case Randomization


30

Random QuickSort

Θ(n log n)

31
Sorting: Time and Space (Again)

In-place: Θ(1)
Θ(n log n) Space: Θ(n)
Θ(n log n) In-place: Θ(1)
In-place: Θ(1)
Θ(n log n) In-place: Θ(1)
Expected running time
32
Bad pivot Normal pivot

1 2 3 4 5 6 7 8 1 2 3 4 5 8 7 6

1 2 3 4 5 6 7 8 1 2 3 5 4 6 7 8

piv
ot = pivot = 5
T(n 8
1 2 3 8 5 6 7 4 )= T(n) = 20.4
204

Random Expected
pivot = 4 QuickSort
T(n) = 2.04 Running Time
1 3 2 4 5 8 7 6
2.04 + 20.4 + 204
Good pivot
[T(n)] =
3
𝔼
33
Two Types of Randomness
Average Case

Randomness of
the input:
e.g., each instance is
equiprobable

Randomness of
the algorithm:
e.g., pivot is chosen
randomly
Randomized algorithm

34

Deterministic algorithms: pivot is the last element


ACRT
WCRT BCRT
Worst Case Best Case Average Case

The input array is The input array Randomness of


ascending, descending, is such that the input:
or all elements are each partition is
e.g., each instance is
identical a bipartition
equiprobable

Expected ACRT Randomness of


Expected BCRT [ [ ⋅ ]]: Expectation over
two types of randomness
the algorithm:
e.g., pivot is chosen
Expected WCRT randomly

Randomized algorithms
35
𝔼
𝔼

Sorting Lower Bound


Sorting: Can We Do Better?

Θ(n log n) Θ(n log n) Θ(n log n)

Θ(n log n) Θ(n log n) Θ(n log n) Can we


Θ(n log n)
outperform
Θ(n log n)
Θ(n log n)?
Θ(n log n) Θ(n log n) Θ(n log n)

37
What Are Lower Bounds?

c ⋅ n log n + o(n log n)


e.g., QuickSort WC BC
Ω(n log n)

c ⋅ f(n) + o( f(n))
Including those There exists
not invented yet, some input instance
Ω( f(n))
e.g., UofASort 38

Two Useful Trees: Recursion Tree

06
06

04
04 09
09

02
02 05
05 ⋯⋯
01
01 03
03
01

39
Two Useful Trees: Decision Tree

2 0 4
A[1] A[2] A[3]

0 2 4

40
Sorting Lower Bounds: Proof

2 0 4
A[1] A[2] A[3]

0 2 4

41
Sorting Lower Bounds: Proof

2 0 4
A[1] A[2] A[3]

0 2 4

2 0 4 0 4 0 4 0 2 All possible
2 4 0 0 2 4 4 2 0 permutations
42

Sorting Lower Bounds: Remarks

Θ(n log n) Θ(n log n) Θ(n log n) MergeSort and HeapSort


are asymptotically optimal
Θ(n log n) Θ(n log n) Θ(n log n)
In the sense of
Θ(n log n) Θ(n log n) Big-O notations and WCRT
Always remember that Big-O and
Θ(n log n) Θ(n log n) Θ(n log n) WC analysis have their own
limitations
Power of randomization!! 43

Sorting Algorithms (So Far)


InsertionSort; MergeSort; HeapSort; QuickSort
Sorting #1: Insertion Sort

2
Θ(n )

45
Sorting #2: Merge Sort

Θ(n log n)

Design idea matters!!


46
Sorting #3: HeapSort
1

4
2 3

1 7

4 5

9 3 10 14
6 7

4 1 7 9 3 10 14 8 2 16 8 2 16
1 2 3 4 5 6 7 8 9 10 8 9 10
Unsorted
47
Θ(n log n) Data structure matters!!
1

1
2 3

2 3

4 5

4 7 8 9
6 7

4 1 7 9 3 10 14 8 2 16 10 14 16
1 2 3 4 5 6 7 8 9 10 Sorted
1 2 3 4 7 8 9 10 14 16 8 9 10

48
Sorting #4: QuickSort

49
Sorting in Different Types

InsertionSort
MergeSort
HeapSort
Quicksort

Hash table
Binary
search tree

50

Binary Search Tree (BST)


Recall: From QuickSort To BST
06
06

04
04 09
09

02
02 05
05 ⋯⋯
01
01 03
03
01

BST
52
Rooted Tree: Data Structure

53
Tree Terminology

54
Binary Search Tree: Definitions

3 2 3 1

2 3 1 2

1 1 2 3
55
BST Operations: Search for A Key

22

18 26

15 21 25

Finding a key x
16

56
Search for A Key: RT Analysis

Runtime for a tree of height h

Finding a key x
O(h)

57

BST Operations: FindMin and Insert

O(h)

58
BST Operations: Delete

Easy Case

O(h)
59
BST: Delete (Simple Example)

60
BST: Delete (Complex Example)

61
BST: Outputting Sorted Sequence

4
4

2 6
2 5

0 3 9
1 3 6

62
Insert 255 keys in a BST in random order. Repeatedly delete and insert keys at random

https://algs4.cs.princeton.edu/32bst/ https://algs4.cs.princeton.edu/32bst/

Skewed? So what??
63
Balancing BST
AVL Trees
Sorted Lists and Arrays
We’ve been talking about sorting for so long, but how sorted
elements are stored so that it is ef cient to manipulate?
Delete; Insert; Search
(Sorted)
Linked HEAD 1 2 3 4 7 8 9 10
Lists

(Sorted)
1 2 3 4 7 8 9 10
Arrays What’s the
di erence?
65
ff

fi
(Sorted) Linked Lists
Delete/Insert: O(1) 6

Suppose we have
the address of the
pointer to the node 1 2 3 4 7 8 9 10
of delete/insert

Search: O(n)

HEAD 1 2 3 4 7 8 9 10

66
(Sorted) Arrays
Delete/Insert: O(n)

Insert element 6 in a
1 2 3 4 7 6 8 9 10 given position

Search: O(log n)

1 2 3 4 7 8 9 10 Binary search: see if 7


is in the array

67
Data Structures (So Far)

Binary
Arrays Linked Lists
Search Trees

O(log n) 👍 O(n) O(h)


Search
👎
Insert/Delete O(n)
👎 O(1) 👍 O(h)
If the BST is
balanced: h = log n 👍
68

Balanced BSTs: Motivation


7
3 4 h = log(n)
5
Balance the BST!!
2 2 6

3 3 4
1
4 2 6
1
1
1 3 5 7
h = Θ(n) 2
69
Balancing BSTs: AVL Trees

h = log(n)
Balance the BST!!

2 6
AVL Trees: named after
inventors Adelson-Velsky and Landis 1 3 5 7

70
Key Operations: Rotations

71
Key Operations: Rotations (Cont’d)

O(1)

72
AVL Trees: Definitions
7 hR = 1
4 9 An AVL tree is a BST where for
any node we have | hL − hR | ≤ 1
2 6 8 10

1 3 5

hL = 2

73
AVL Trees: Examples
22

18 An AVL tree is a BST where for


22 any node we have | hL − hR | ≤ 1
15 21
18 26 2
2
1 16 0
15 21 25 0 4
Not AVL tree 2
16
AVL tree 4
74
AVL Trees: h ≤ O(log n)

4 9

2 6 8 10

1 3 5

So it’s important to keep the AVL property h


n ≥ F(h) ≈ 1.618
75
Maintain the AVL Property

O(1)

O(h)

O(log n)

76
Single Rotation (Case-1)
z≤y≤x
z y
y z x
T0 x
T1 T3
T0 T1 T2
T3
T2 z ≤ T1 ≤ y
77
Single Rotation (Case-2)
x≤y≤z
z y
y
x x z
T3
T2 T0
T0 T1 T2 T3
T1
y ≤ T2 ≤ z
78
Double Rotation (Case-3)
z≤x≤y
z
x
y z y
T0 x

T2
T3 T3
T2 T0 T1
T1
z ≤ T1 ≤ x x ≤ T2 ≤ y
79
Double Rotation (Case-4)
y≤x≤z
z x
y y z
x
T0
T3 T2
T2 T3 T1 T0
T1
y ≤ T2 ≤ x x ≤ T1 ≤ z
80
AVL Trees: “Best of Both Worlds”

Balanced BSTs
Arrays Linked Lists (e.g., AVL trees)

O(log n) 👍 O(n) O(log n) 👍


Search
👎
Insert/Delete O(n)
👎 O(1) 👍 O(log n) 👍

81

Always Ask: Can We Do Better?


Average-case

Balanced BSTs Hash


Arrays Linked Lists (e.g., AVL trees) Table

O(log n) 👍 O(n) O(log n) 👍 O(1)


Search
👎
Insert/Delete O(n)
👎 O(1) 👍 O(log n) 👍 O(1)

Worst-case
82

Hash Tables
Direct-address Tables
T
0
k = T[k]
U 2
(Universe of keys) 2
3
3
2 3
K 5
(actual 5 5
keys)
10204
7 7
7

Delete/Insert/
2 3 5 7 10 204
84 🙀 Search: O(1)

Hash Tables
T
0
k = T[h(k)] ⋮
U
(Universe of keys) h(k1)
h(k3)
k1 k3
K k4 h(k2) = h(k4)
(actual
keys) k2
10 204 k5
h(k5)
😛 ⋮
Hash Table T[0..m − 1] m−1
Hash Function h : U → {0,1,⋯, m − 1}
85
Collision!!

Collision Resolution: Chaining


T
k = T[h(k)]
U
(Universe of keys) k1
k3 Singly
k1 k3 linked lists
K k4 k2 k4
(actual
keys) k2
k5
k5
Chaining
Hash Table T[0..m − 1]
Hash Function h : U → {0,1,⋯, m − 1} Doubly k2 k4
86 linked lists

Hash Table: Insert/Delete/Search


T

v key1 key2

For insert and remove, no need to traverse if using direct addressing (CLRS, pp. 258)

h is often a simple Dominated by


function, so O(1) the longest list in T
87

Hash Table: Worst Case


T

U
(Universe of keys) Worst case: the hash
table is reduced to a linked list

K k3
k2 k1 k2 k3
(actual
keys) k1
k4 A linked list k4
of n elements

Hash Table T[0..m − 1]
Hash Function h : U → {0,1,⋯, m − 1} kn
88

WC Runtime by Chaining
T
Place all elements hashed to the If using singly linked lists,
same slot into the same linked list can we still get O(1)?

k1 k2 k3

A linked list k4
of n elements

This is complicated: what is the length of kn
the list — in what sense (WC, AC, or BC)?
89

Analysis of Hashing by Chaining


T
0
U ⋮
(Universe of keys) k1
k3
k1 k3
K k4 k2 k4
(actual
keys) k2
k5
k5
⋮ Average # of
α = n/m elements per slot
Hash Table T[0..m − 1] m−1
Hash Function h : U → {0,1,⋯, m − 1} Load
90
factor

Searching: O(1) AC Runtime


T
0

k1
k3

k2 k4

k5

👍👍 α = n/m
m−1
Average # of
elements per slot
Load
91
factor

Simple Uniform Hashing: Intuition


T





👍👍 α = n/m

Load factor
Simple uniform hashing: In average
is a constant
92 these lists have the same length

Proof: Unsuccessful Search


T




CLRS, p.259 ⋮

93
Proof: Successful Search
T
Θ(1 + α) ⋯


CLRS, p.260 ⋯
n n
1 1 ⋯
∑ ∑
Each key is equally 1+
likely to be the one n i=1 j=i+1
m
being searched ⋯

The i-th element
Expected # of elements added after the ⋯
key being searched (i.e., the i-th element)
inserted into the table
was added to the list
94

Simple Uniform Hashing: Example


T





95
Universal Hashing

To maintain the same length for all the lists in average

Randomization
96
Universal Hashing: ACRT
T
0

k1
k3

k2 k4

k5

m−1

By cleverly randomizing the choice of hash function at run time, we guarantee that we
can process every sequence of operations with a good average-case runtime (ACRT)

97
Universal Hashing: Example
T
0

k1
k3

k2 k4

k5

m−1

Proof (CLRS, p. 267-268)


98
This Week Data structures for fast
delete/insert/search
QuickSort (BSTs and Hash Tables)

99

Next Week
Greedy Methods

You might also like