Professional Documents
Culture Documents
Algorithms and Data Structures: Dictionaries
Algorithms and Data Structures: Dictionaries
Algorithms and Data Structures: Dictionaries
and Data
Structures
Marcin
Sydow
Dictionary
Algorithms and Data Structures
Hashtables Dictionaries
Dynamic
Ordered Set
BST
Marcin Sydow
AVL
Self-
organising
Web Mining Lab
BST PJWSTK
Summary
Topics covered by this lecture:
Algorithms
and Data
Structures
Marcin
Sydow
Dictionary
Dictionary
Hashtables
Hashtable
Dynamic
Ordered Set
Binary Search Tree (BST)
BST
AVL
AVL Tree
Summary
Dictionary
Algorithms
and Data
Structures
Dictionary is an abstract data structure that supports the
Marcin
Sydow following operations:
Dictionary
search(K key)
Hashtables
Dynamic
(returns the value associated with the given key)1
Ordered Set
insert(K key, V value)
BST
delete(K key)
AVL
Self-
organising
BST Each element stored in a dictionary is identied by a key of type
Summary K. Dictionary represents a mapping from keys to values.
1
Search can return a special value if key is absent in dictionary
Examples
Algorithms
and Data
Structures
Marcin
Sydow
contact book
Dictionary
key: name of person; value: telephone number
Algorithms
and Data
Structures
Algorithms
and Data Elements of a dictionary can be kept in a sequence (linked list
Structures
or array):
Marcin
Sydow
(data size: number of elements (n); dom. op.: key comparison)
Dictionary
Hashtables
unordered:
Dynamic
Ordered Set search: O(n); insert: O(1); delete: O(n)
BST
ordered array:
AVL
search: O(log n); insert O(n); delete O(n)
Self-
organising ordered linked list:
BST
search: O(n); insert O(n); delete: O(n)
Summary
Algorithms
and Data Assume potential keys are numbers from some universe U ⊆ N.
Structures
Marcin
Sydow
An element with key k ∈U can be kept under index k in a
|U |-element array:
Dictionary
AVL
Self-
organising
BST
Summary
Direct Addressing
Algorithms
and Data Assume potential keys are numbers from some universe U ⊆ N.
Structures
Marcin
Sydow
An element with key k ∈U can be kept under index k in a
|U |-element array:
Dictionary
Summary
Direct Addressing
Algorithms
and Data Assume potential keys are numbers from some universe U ⊆ N.
Structures
Marcin
Sydow
An element with key k ∈U can be kept under index k in a
|U |-element array:
Dictionary
Algorithms
and Data
Structures
Marcin
Sydow The idea is simple.
Dictionary
Elements are kept in an m-element array [0, ..., m − 1], where
m << |U |
Hashtables
Dynamic
Ordered Set
BST
The index of key is computed by fast hash function:
AVL
Self-
hashing function: h : U → [0..m − 1]
organising
BST
For a given key k its position is computed by h (k ) before each
Summary
dictionary operation.
Hashing Non-integer Keys
Algorithms
and Data
Structures
Marcin
Sydow
What if the type of key is not integer?
Dictionary
Additional step is needed: before computing the hash function,
Hashtables
Dynamic
the key should be transformed to integer.
Ordered Set
Algorithms
ideal
and Data
Structures Important properties of an hash function
Marcin h → [0, ..., m − 1]:
Sydow
Dictionary
uniform load on each index 0 ≤i <m (i.e. each of m
Hashtables
possible values is equally likely for a random key)
Dynamic
Ordered Set fast (constant time) computation
BST
dierent values even for very similar keys
AVL
Self-
organising Example:
BST
Algorithms
and Data
Structures
Marcin
Sydow
Assume a new key k comes on position h (k ) that is not free.
Dictionary
Hashtables Two common ways of dealing with collisions in hash tables are:
Dynamic
Ordered Set
BST
k is added to a list l (h(k )) kept at position h(k ):
AVL (chaining method)
Self-
organising
other indexes are scanned (in a repeatable way) until a free
BST index is found: (open hashing )
Summary
Chain Method
Algorithms
and Data
Structures
n - number of elements kept
Marcin
Sydow
compute h(k ): O(1)
Dictionary
insert: compute h (k ) and add new element to the list at
Algorithms
and Data
Structures
If hash function satisesuniform load assumption, chain
Marcin
method guarantees average of O (1 + α) comparisons for all
Sydow
dictionary operations, where α = n /m (load factor). Thus, if
Dictionary
m = O (n) chain methods results in average O (1) time for all
Hashtables dictionary operations.
Dynamic
Ordered Set Proof: Assume a random key k to be hashed. Let X denote random variable
representing the length of a list l (h(k )). Any operation needs constant time for
BST
computing h(k ) and then linearly scans the list l (h(k )), and thus costs
AVL O (1 + E [X ]). Let S be the set of elements kept in hashtable, and for e ∈ S let
Self- Xe denote indicator randomPvariable such that Xe == 1 i h(k ) == h(e ) and 0
organising otherwise2 . We have X = e ∈S Xe . Now,
BST
Summary 1 n
E [X ] = E [ Xe ] = E [Xe ] = P (Xe == 1) = |S |
X X X
=
e ∈S e ∈S e ∈S
m m
Thus O (1 + E [X ]) = O (1 + α).
2
Can be denoted shortly as: Xe = bh(k ) == h(e )e
Universal Hashing
Algorithms
and Data
Structures Family H of hash functions into range 0, ..., m − 1 is called
Marcin c-universal, for c > 0, if for randomly chosen hash function
Sydow
h ∈ H any two distinct keys i , j collide with probability:
Dictionary
BST
AVL
Family H is called universal if c == 1
Self-
organising
BST To avoid malicious data, hash function can be rst randomly
Summary picked from a c-universal hashing family.
Algorithms
and Data In open hashing, there is exactly one element on each position.
Structures
Marcin
Consider insert operation: if, for a new k , h (k ) is already in
Sydow use, the entries are scanned in a specied (and repeatable)
Dictionary
order π(k ) = (h(k , 0), h(k , 1), ..., h(k , m − 1)) until a free plase
Hashtables
is found. find is analogous, delete additionally needs to
Dynamic
restore the hash table after removing the element.
Algorithms
and Data
Structures In open hashing, under assumption that all scan orders are
Dictionary
Self-
organising
( α = n /m < 1 (load factor))
BST
Algorithms
and Data
Structures
Marcin
Sydow Previous methods guarantee expected constant time of
dicitionary operations.
Dictionary
Hashtables
Perfect hashing is a scheme that guarantees worst case constant
Dynamic
Ordered Set
time.
BST
It is possible to construct a perfect hashing function, for a given
AVL
Self-
set of n elements to be hashed, in expected (i.e. average) linear
organising
BST
time: O (n)
Summary
(the construction can be based on some family of 2 − universal hash
functions (Fredman, Komlos, Szemeredi 1984))
Dynamic Ordered Set
search(K key)
Dictionary
insert(K key, V value)
Hashtables
delete(K key)
Dynamic
Ordered Set
minimum()
BST
maximum()
AVL
Algorithms
and Data
Structures
Marcin
Sydow
Dynamic
Ordered Set For each node, the key contained in this node is higher or equal
BST than all the keys contained in the left subtree of this node and
AVL lower or equal than all keys in its right subtree
Self-
organising
BST Where is the minimum key? Where is the maximum key?
Summary
Search Operation
Algorithms
and Data
Structures
Marcin
Sydow
searchRecursive(node, key): \\ called with node == root
Dictionary if ((node == null) or (node.key == key)) return node
Hashtables if (key < node.key) return search(node.left, key)
Dynamic else return search(node.right, key)
Ordered Set
BST
searchIterative(node, key): \\ called with node == root
AVL
while ((node != null) and (node.key != key))
Self-
organising
if (key < node.key) node = node.left
BST else node = node.right
Summary return node
Minimum and Maximum
Algorithms
Marcin
while (node.left != null) node = node.left
Sydow
return node
Dictionary
maximum(node): \\ called with node == root
Hashtables
while (node.right != null) node = node.right
return node
Dynamic
Ordered Set
BST
AVL
successor(node):
Self-
if (node.right != null) return minimum(node.right)
organising p = node.parent
BST
while ((p != null) and (node == p.right)
Summary
node = p
p = p.parent
return p
Algorithms
and Data
Structures
Marcin
Sydow
insert(node, key):
if (key < node.key) then
Dictionary
if node.left == null:
n = create new node with key
Hashtables
Dynamic
Ordered Set node.left = n
BST else: insert(node.left, key)
AVL else: // (key >= node.key)
Self-
organising
if node.right == null:
BST
n = create new node with key
Summary
node.right = n
else: insert(node.right, key)
Example delete Implementation
Algorithms
and Data
Structures
Algorithms
and Data
Structures
Marcin
// delete1: for nodes having only 1 son
Sydow
procedure delete1(node)
Dictionary begin
subtree = null
Hashtables
parent = node.parent
Dynamic if (node.left != null)
Ordered Set
subtree = node.left
BST else
AVL
subtree = node.right
Self-
organising
if (parent == null)
BST root = subtree
Summary
else if (parent.left == node) // node is a left son
parent.left = subtree
else // node is a right son
parent.right = subtree
BST: Average Case Analysis
Dictionary O (logn).
Hashtables
Dynamic
Two cases for operations concerning a key k:
Ordered Set
k is not present in BST: in this case the complexities are
BST
bounded by average height of a BST
AVL
k is present in BST: in this case the complexities of
Self-
organising operations are bounded by average depth of a node in
BST
BST
Summary
An expected height of a random-permutation model BST can
be proved to be O (logn) by analogy to QuickSort (the proof is
omitted in this lecture)
3
If we assume other model: i.e. that every n-element BST is equally
√
likely, the average height is Θ( n). This model seems to be less natural,
(*)Average Depth of a Node in BST
(random permutation model)
Algorithms
and Data
Marcin
Sydow omitted but it can be easily derived from the explanation)
For a sequence of keys hki i inserted to a BST dene:
Dictionary
Gj = {ki : 1 ≤ i < j and kl > ki > kj for all l < i such that kl > kj }
Hashtables Lj = {ki : 1 ≤ i < j and kl < ki < kj for all l < i such that kl < kj }
Dynamic Observe, that the path from root to kj consists exactly from Gj ∪ Lj
so that the depth of kj will be d (kj ) = |Gj | + |Lj |
Ordered Set
BST
Gj consists of the keys that arrived before kj and are its direct
AVL
successors (in current subsequence). The i − th element in a random
Self-
organising
permutation is a current minimum with probability 1/i . So that the
expected number P of updating minimum in n − element random
BST
Algorithms
and Data
Structures
data size: number of elements in dictionary (n)
Marcin
Sydow dominating operation: comparison of keys
Θ(logn)
Dynamic
Ordered Set search
BST
insert Θ(logn)
AVL
delete Θ(logn)
Self-
organising
BST
minimum/maximum Θ(logn)
Summary successor/predecessor Θ(logn)
Algorithms
and Data
Structures
Marcin
Sydow
Hashtables
dictionary to guarantee O (logn) worst-case height.
Dynamic
Ordered Set AVL is dened as follows:
BST
AVL AVL is a BST with the additional condition: for each node the
Self- dierence of height of its left and right sub-tree is not greater
organising
BST than 1.
Summary
Maximum Height of an AVL Tree
Th
Algorithms
and Data Let be a minimum number of nodes in an AVL tree that has
h.
Structures
height
Marcin
Sydow
Observe that:
Dictionary
T0 = 1, T1 = 2
Hashtables
AVL
Thus: Th ≥ Fh (Fibonacci number). Remind: h-th Fibonacci number
Self-
organising
has exponential growth (in h). Since the minimum number of nodes
BST in AVL has at least exponential growth in height of the tree (h), the
Summary height of AVL has at most logarithmic growth in the number of
nodes.
Algorithms
and Data
Structures
Marcin
Sydow The same as on BST but:
Algorithms All the dictionary operations on AVL begin in the same way as
and Data
Structures in the BST. However, after each modifying operation on this
Marcin
tree the bf values are re-computed (bottom-up)
Sydow
BST
There are 2 kinds of AVL rotations: single and double and both
AVL
have 2 mirror variants: left and right.
Self-
organising
BST
Each rotation has O (1) time complexity.
Summary
Algorithms
and Data
Structures To summarise:
Marcin
Sydow
each rotation has O(1) complexity
Dictionary
(as in BST) the complexities of operations are bounded by
Hashtables
the height of the tree
Dynamic
Ordered Set
an n-element AVL tree has at most logarithmic height
BST
AVL
Self-
Thus: all dictionary operations have guaranteed O (logn)
organising worst-case complexities on AVL.
BST
Summary
Note: the maximum number of rotations after a single delete
operation could be logarithmic on n, though.
4
4
This may happen on a Fibonacci tree. To see example: Donald Knuth,
The Art of Computer Programming, vol. 3: Sorting
Self-organising BST (or Splay-trees)
Marcin
total complexity of O (mlogn).
Sydow
Dynamic
splay(k): by a sequence of rotations bring to the root either k
Ordered Set
(if it is present in the tree) or its direct successor or predecessor
BST
AVL
insert(k): splay(k) (to bring successor (predecessor) k 0 of k to
the root), then make k 0 the right (left) son of k
Self-
Summary two separete subtrees), splay(k) again on the left (right) subtree
(to bring predecessor (successor) k 0 of k to the root), make k 0
of the right (left) orphaned subtree.
It can be proved that the insert and delete operations (described
above) have amortised logarithmic time complexities.
Large on-disk dictionaries
Algorithms
and Data
Structures
Marcin
Sydow
There are special data structures designed for implementing
Dictionary
dictionary in case it does not t to memory (mostly kept on
Hashtables
disk).
Dynamic
Ordered Set
Example: B-trees (and variants). The key idea: minimize the
BST
disk read/write activity (node should t in a single disk block
AVL
Self-
size)
organising
BST
Used in DB implementations (among others).
Summary
Dictionaries Implementations: Brief Summary of the
Lecture
Algorithms
and Data
Hashtables provide very fast operations but do not support
Structures
ordering-based operations (as successor, minimum, etc.)
Marcin
Sydow
BST is the simplest implementation of ordered dictionary
Dictionary that guarantees average logarithmic complexities, but have
Hashtables linear pessimistic complexities
Dynamic
Ordered Set
AVL is an extension of BST that guarantees even
Algorithms
and Data
Structures
Marcin Dictionary
Sydow
Hashing
Dictionary Chain Method
Hashtables Open Hashing
Dynamic Universal Hashing
Ordered Set
Perfect Hashing
BST
Ordered Dynamic Set
AVL
Self- BST
organising
BST
AVL
Summary
Self-organising BST
Marcin
Sydow
Dictionary
Hashtables
Dynamic
Ordered Set Thank you for attention
BST
AVL
Self-
organising
BST
Summary