Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Hash

Dr. Rang Nguyen

Chapter 12
Hash Basic concepts

Hash functions
Direct Hashing
Modulo division
Data Structures and Algorithms Digit extraction
Mid-square
Mid-square
Folding

Collision resolution

Dr. Rang Nguyen Open addressing


Linked list resolution

Faculty of Computer Science and Engineering


University of Technology, VNU-HCM

12.1
Hash
Outcomes
Dr. Rang Nguyen

• L.O.5.1 - Depict the following concepts: hashing table,


key, collision, and collision resolution.
• L.O.5.2 - Describe hashing functions using pseudocode
and give examples to show their algorithms. Basic concepts

• L.O.5.3 - Describe collision resolution methods using Hash functions


Direct Hashing
pseudocode and give examples to show their algorithms. Modulo division
Digit extraction
• L.O.5.4 - Implement hashing tables using C/C++. Mid-square
Mid-square

• L.O.5.5 - Analyze the complexity and develop Folding

Collision resolution
experiment (program) to evaluate methods supplied for Open addressing

hashing tables. Linked list resolution

• L.O.1.2 - Analyze algorithms and use Big-O notation to


characterize the computational complexity of algorithms
composed by using the following control structures:
sequence, branching, and iteration (not recursion).

12.2
Hash
Contents
Dr. Rang Nguyen

1 Basic concepts

2 Hash functions
Basic concepts
Direct Hashing
Hash functions
Modulo division Direct Hashing
Modulo division
Digit extraction Digit extraction

Mid-square Mid-square
Mid-square

Mid-square Folding

Collision resolution
Folding Open addressing
Linked list resolution

3 Collision resolution
Open addressing
Linked list resolution

12.3
Hash

Dr. Rang Nguyen

Basic concepts

Hash functions

Basic concepts Direct Hashing


Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.4
Hash
Basic concepts
Dr. Rang Nguyen

• Sequential search: O(n)


• Binary search: O(log2 n)
Basic concepts

Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding

→ Requiring several key Collision resolution


Open addressing
Linked list resolution

comparisons before the


target is found.
12.5
Hash
Basic concepts
Dr. Rang Nguyen

Search complexity:
Size Binary Sequential Sequential
(Average) (Worst Case) Basic concepts

16 4 8 16 Hash functions
Direct Hashing
Modulo division

50 6 25 50 Digit extraction
Mid-square

256 8 128 256 Mid-square


Folding

1,000 10 500 1,000 Collision resolution


Open addressing

10,000 14 5,000 10,000


Linked list resolution

100,000 17 50,000 100,000


1,000,000 20 500,000 1,000,000

12.6
Hash
Basic concepts
Dr. Rang Nguyen

Basic concepts

Is there a search algorithm Hash functions


Direct Hashing
Modulo division

whose complexity is O(1)? Digit extraction


Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.7
Hash
Basic concepts
Dr. Rang Nguyen

Basic concepts

Is there a search algorithm Hash functions


Direct Hashing
Modulo division

whose complexity is O(1)? Digit extraction


Mid-square
Mid-square
Folding

YES Collision resolution


Open addressing
Linked list resolution

12.7
Hash
Basic concepts
Dr. Rang Nguyen

Basic concepts

Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

Hình: Each key has only one address

12.8
Hash
Basic concepts
Dr. Rang Nguyen

Basic concepts

Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.9
Hash
Basic concepts
Dr. Rang Nguyen

• Home address: address produced by a hash


function.
• Prime area: memory that contains all the
Basic concepts
home addresses. Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.10
Hash
Basic concepts
Dr. Rang Nguyen

• Home address: address produced by a hash


function.
• Prime area: memory that contains all the
Basic concepts
home addresses. Hash functions

• Synonyms: a set of keys that hash to the Direct Hashing


Modulo division
Digit extraction

same location. Mid-square


Mid-square
Folding

• Collision: the location of the data to be Collision resolution


Open addressing

inserted is already occupied by the synonym Linked list resolution

data.

12.10
Hash
Basic concepts
Dr. Rang Nguyen

• Home address: address produced by a hash


function.
• Prime area: memory that contains all the
Basic concepts
home addresses. Hash functions

• Synonyms: a set of keys that hash to the Direct Hashing


Modulo division
Digit extraction

same location. Mid-square


Mid-square
Folding

• Collision: the location of the data to be Collision resolution


Open addressing

inserted is already occupied by the synonym Linked list resolution

data.
• Ideal hashing:
• No location collision
• Compact address space
12.10
Hash
Basic concepts
Dr. Rang Nguyen

Basic concepts

Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.11
Hash
Basic concepts
Dr. Rang Nguyen

Basic concepts

Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.12
Hash
Basic concepts
Dr. Rang Nguyen

Basic concepts

Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.13
Hash
Basic concepts
Dr. Rang Nguyen

Basic concepts

Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.14
Hash

Dr. Rang Nguyen

Basic concepts

Hash functions

Hash functions Direct Hashing


Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.15
Hash
Hash functions
Dr. Rang Nguyen

• Direct hashing
• Modulo division Basic concepts

Hash functions
• Digit extraction Direct Hashing
Modulo division
Digit extraction

• Mid-square Mid-square
Mid-square
Folding

• Folding Collision resolution


Open addressing

• Rotation Linked list resolution

• Pseudo-random

12.16
Hash
Direct Hashing
Dr. Rang Nguyen

Basic concepts

Hash functions
The address is the key itself: Direct Hashing
Modulo division

hash(Key) = Key Digit extraction


Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.17
Hash
Direct Hashing
Dr. Rang Nguyen

Basic concepts

• Advantage: there is no collision. Hash functions


Direct Hashing
Modulo division

• Disadvantage: the address space (storage Digit extraction


Mid-square

size) is as large as the key space. Mid-square


Folding

Collision resolution
Open addressing
Linked list resolution

12.18
Hash
Modulo division
Dr. Rang Nguyen

Address = Key mod listSize


Basic concepts

• Fewer collisions if listSize is a prime Hash functions


Direct Hashing
Modulo division

number. Digit extraction


Mid-square

• Example:
Mid-square
Folding

Collision resolution
Numbering system to handle 1,000,000 Open addressing
Linked list resolution

employees
Data space to store up to 300 employees
hash(121267) = 121267 mod 307 = 2

12.19
Hash
Digit extraction
Dr. Rang Nguyen

Address = selected digits f rom Key Basic concepts

Hash functions
Direct Hashing
Example: Modulo division
Digit extraction

379452→394 Mid-square
Mid-square

121267→112 Folding

Collision resolution

378845→388 Open addressing


Linked list resolution

160252→102
045128→051

12.20
Hash
Mid-square
Dr. Rang Nguyen

Basic concepts

Hash functions
2 Direct Hashing
Address = middle digits of Key Modulo division
Digit extraction
Mid-square

Example: Mid-square
Folding

9452 * 9452 = 89340304→3403 Collision resolution


Open addressing
Linked list resolution

12.21
Hash
Mid-square
Dr. Rang Nguyen

• Disadvantage: the size of the Key 2 is too


Basic concepts
large. Hash functions

• Variations: use only a portion of the key. Direct Hashing


Modulo division
Digit extraction

Example: Mid-square
Mid-square

379452: 379 * 379 = 143641→364


Folding

Collision resolution

121267: 121 * 121 = 014641→464 Open addressing


Linked list resolution

045128: 045 * 045 = 002025→202

12.22
Hash
Folding
Dr. Rang Nguyen

The key is divided into parts whose size


matches the address size.
Basic concepts
Example: Hash functions

Key = 123|456|789 Direct Hashing


Modulo division
Digit extraction
fold shift Mid-square
Mid-square

123 + 456 + 789 = 1368 Folding

Collision resolution
→ 368 Open addressing
Linked list resolution

12.23
Hash
Folding
Dr. Rang Nguyen

The key is divided into parts whose size


matches the address size.
Basic concepts
Example: Hash functions

Key = 123|456|789 Direct Hashing


Modulo division
Digit extraction
fold shift Mid-square
Mid-square

123 + 456 + 789 = 1368 Folding

Collision resolution
→ 368 Open addressing
Linked list resolution

fold boundary
321 + 456 + 987 = 1764
→ 764
12.23
Hash

Dr. Rang Nguyen

Basic concepts

Hash functions

Collision resolution Direct Hashing


Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.24
Hash
Collision resolution
Dr. Rang Nguyen

• Except for the direct hashing, none of the Basic concepts

others are one-to-one mapping Hash functions


Direct Hashing

→ Requiring collision resolution methods Modulo division


Digit extraction
Mid-square
Mid-square
Folding

• Each collision resolution method can be Collision resolution

used independently with each hash function


Open addressing
Linked list resolution

12.25
Hash
Collision resolution
Dr. Rang Nguyen

• Closed Hashing Basic concepts

Hash functions
• Open addressing Direct Hashing
Modulo division
• Bucket hashing Digit extraction

• Open Hashing
Mid-square
Mid-square
Folding

• Linked list resolution Collision resolution


Open addressing
Linked list resolution

12.26
Hash
Open addressing
Dr. Rang Nguyen

Basic concepts

When a collision occurs, an Hash functions


Direct Hashing
Modulo division

unoccupied element is searched Digit extraction


Mid-square
Mid-square

for placing the new element in. Folding

Collision resolution
Open addressing
Linked list resolution

12.27
Hash
Open addressing
Dr. Rang Nguyen

Basic concepts

Hash function: Hash functions


Direct Hashing

h : U → {0, 1, 2, ..., m − 1} Modulo division


Digit extraction
Mid-square
Mid-square
Folding

set of keys addresses Collision resolution


Open addressing
Linked list resolution

12.28
Hash
Open addressing
Dr. Rang Nguyen

Basic concepts

Hash and probe function: Hash functions


Direct Hashing

hp : U ×{0, 1, 2, ..., m−1} → {0, 1, 2, ..., m−1} Modulo division


Digit extraction
Mid-square
Mid-square

set of keys probe numbers addresses Folding

Collision resolution
Open addressing
Linked list resolution

12.29
Hash
Open Addressing
Dr. Rang Nguyen

Algorithm hashInsert(ref T <array>, val k <key>)


Inserts key k into table T.

i=0
Basic concepts
while i < m do
Hash functions
j = hp(k, i) Direct Hashing

if T[j] = nil then Modulo division


Digit extraction

T[j] = k Mid-square
Mid-square

return j Folding

Collision resolution
else Open addressing

i=i+1 Linked list resolution

end
end
return error: “hash table overflow”
End hashInsert

12.30
Hash
Open Addressing
Dr. Rang Nguyen

Algorithm hashSearch(val T <array>, val k <key>)


Searches for key k in table T.

i=0
while i < m do Basic concepts

j = hp(k, i) Hash functions


Direct Hashing
if T[j] = k then Modulo division

return j Digit extraction


Mid-square

else if T[j] = nil then Mid-square


Folding

return nil Collision resolution

else Open addressing


Linked list resolution

i=i+1
end
end
return nil
End hashSearch
12.31
Hash
Open Addressing
Dr. Rang Nguyen

There are different methods: Basic concepts

• Linear probing Hash functions


Direct Hashing
Modulo division

• Quadratic probing Digit extraction


Mid-square
Mid-square

• Double hashing Folding

Collision resolution
• Key offset Open addressing
Linked list resolution

12.32
Hash
Linear Probing
Dr. Rang Nguyen
• When a home address is occupied, go to
the next address (the current address + 1):
hp(k, i) = (h(k) + i) mod m
Basic concepts

Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.33
Hash
Linear Probing
Dr. Rang Nguyen
• When a home address is occupied, go to
the next address (the current address + 1):
hp(k, i) = (h(k) + i) mod m
Basic concepts

Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.33
Hash
Linear Probing
Dr. Rang Nguyen

Basic concepts

Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.34
Hash
Linear Probing
Dr. Rang Nguyen

• Advantages:
• quite simple to implement Basic concepts

Hash functions
• data tend to remain near their home Direct Hashing
Modulo division

address (significant for disk addresses)


Digit extraction
Mid-square
Mid-square
Folding

• Disadvantages: Collision resolution


Open addressing

• produces primary clustering


Linked list resolution

12.35
Hash
Quadratic Probing
Dr. Rang Nguyen

Basic concepts

• The address increment is the collision probe Hash functions


Direct Hashing
Modulo division

number squared: Digit extraction


Mid-square

hp(k, i) = (h(k) + i2 ) mod m Mid-square


Folding

Collision resolution
Open addressing
Linked list resolution

12.36
Hash
Quadratic Probing
Dr. Rang Nguyen

• Advantages:
• works much better than linear probing Basic concepts

Hash functions
Direct Hashing

• Disadvantages:
Modulo division
Digit extraction
Mid-square

• time required to square numbers Mid-square


Folding

• produces secondary clustering Collision resolution


Open addressing
Linked list resolution

h(k1 ) = h(k2 ) → hp(k1 , i) = hp(k2 , i)

12.37
Hash
Double Hashing
Dr. Rang Nguyen

Basic concepts

Hash functions
• Using two hash functions: Direct Hashing
Modulo division
Digit extraction

hp(k, i) = (h1 (k) + ih2 (k)) mod m Mid-square


Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.38
Hash
Linked List Resolution
Dr. Rang Nguyen

Basic concepts
• Major disadvantage of Open Addressing: Hash functions
Direct Hashing

each collision resolution increases the Modulo division


Digit extraction

probability for future collisions. Mid-square


Mid-square
Folding

→ use linked lists to store synonyms Collision resolution


Open addressing
Linked list resolution

12.39
Hash
Linked list resolution
Dr. Rang Nguyen

Basic concepts

Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding

Collision resolution
Open addressing
Linked list resolution

12.40

You might also like