Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 33

Data Structure

Hashing and Collision

Week 6
Program Studi Teknik Informatika
Fakultas Teknik – Universitas Surabaya
Review Exercise Week 5
Exercise

1. Illustrate / simulate the conversion of Infix expression to


Postfix expression.
Infix : ( 2 + 3 ) * ( 4 - 5 )

2. After obtaining Postfix expression from point number 1, also


illustrate / simulate Postfix calculations.

3
Infix to Postfix
(2+3)*(4–5)

Iteration-1 Iteration-2
• ( is opening bracket (5th A Rule), • 2 is operand (1st Rule), then
then push to stack put to Postfix
• Postfix = • Postfix = 2
• Stack = ( (tos) • Stack = ( (tos)

4
Infix to Postfix
(2+3)*(4–5)

Iteration-3 Iteration-4
• + is operator and tos < + (3rd A Rule), • 3 is operand (1st Rule),
then push to stack then put to Postfix
• Postfix = 2 • Postfix = 2 3
• Stack = ( + (tos) • Stack = ( + (tos)

5
Infix to Postfix
(2+3)*(4–5)

Iteration-5 Iteration-6
• ) is closing bracket (5th B Rule) • * is operator and stack is
• Postfix = 2 3 + empty (2nd Rule), then push to
• Stack = stack
• Postfix = 2 3 +
• Stack = * (tos)

6
Infix to Postfix
(2+3)*(4–5)

Iteration-7 Iteration-8
• ( is opening bracket (5th A Rule), • 4 is operand (1st Rule),
then push to stack then put to Postfix
• Postfix = 2 3 + • Postfix = 2 3 + 4
• Stack = * ( (tos) • Stack = * ( (tos)

7
Infix to Postfix
(2+3)*(4–5)

Iteration-9 Iteration-10
• - is operator and tos < - (3rd A Rule), • 5 is operand (1st Rule),
then push to stack then put to Postfix
• Postfix = 2 3 + 4 • Postfix = 2 3 + 4 5
• Stack = * ( - (tos) • Stack = * ( - (tos)

8
Infix to Postfix
(2+3)*(4–5)

Iteration-11 No More Item in Infix


• ) is closing bracket (5th B Rule) • 4th Rule : pop all items
• Postfix = 2 3 + 4 5 - from stack and put them in
• Stack = * (tos) the postfix

Postfix = 2 3 + 4 5 - *

9
Calculate postfix
23+45-*
Calculate Postfix
23+45-*

Iteration-1 Iteration-3
• 2 is operand, then Push to Stack • + is operator, then Pop 2
• Stack = 2(tos) operand from Stack
• Calculate : 2 + 3 = 5, Push
the result to Stack
Iteration-2
• Stack = 5(tos)
• 3 is operand, then Push to Stack
• Stack = 2 3(tos)

11
Calculate Postfix
23+45-*

Iteration-4 Iteration-6
• 4 is operand, then Push to Stack • - is operator, then Pop 2
• Stack = 5 4(tos) operand from Stack
• Calculate : 4 - 5 = -1, Push
the result to Stack
Iteration-5
• Stack = 5 -1(tos)
• 5 is operand, then Push to Stack
• Stack = 5 4 5(tos)

12
Calculate Postfix
23+45-*

Iteration-7
• * is operator, then Pop 2 operand from Stack
• Calculate : 5 * -1 = -5, Push the result to Stack
• Stack = -5(tos)

Because all items in the expression has been traversed,


Pop the stack to get the result
Result = -5
13
Let’s start the material
Hashing Concept
• Hashing is a technique or process of mapping keys, values into the hash
table by using a hash function.
• The efficiency of mapping depends on the hash function used.
• For example University of Surabaya uses 9 digit ID with the following
meaning:
Digit Meaning
1–2 Faculty Code (11=Pharmacy, 12=xxx, 16=Engineering, 21 = yyyy, etc.)
3–4 Program Study (in Engineering, 04 = Informatics, etc.)
5–6 Year of entry
7–9 The order of the registration
Hashing Concept
• For our example, the hash function H can be described as:

H(x) = last four digits of x


x : is a 9 digit student ID
• Next, we will store the following data into a hash table.
ID Name Address Telp
120114002 Naufal Manukan 100 031333333
160414001 Njoto Rungkut 10 0311111111
230114017 Vincent Gununganyar 22 031222222
Hashing Concept
• The result after Hashing (See the difference)
Index / ID Name Address Telp
4001 Njoto Rungkut 10 0311111111
4002 Naufal Manukan 100 031333333
4017 Vincent Gununganyar 22 031222222

• Question : Can you spot any problem with the previous


Hash Function?
Collision
• If more than one items map to the same index of the array then it
will occur collision.

• For example, using the hash function defined in the previous


slide, the collision will occur for the following student ID:
– 120114002
– 220114002
– 130114002
– Etc.
Collision Avoidance
• Collision force us to perform some corrective action in storing an
item
• Due to the increased cost associated with collisions, we must try
to avoid collision as best as we can
• The number of collisions is directly correlated to:
– the hash function used and
– the distribution of data being passed to the hash function
Collision Avoidance
• For example, if we know that our Student ID data will consist of
50% students from Industrial Engineering (the student ID begin
with 1603) and another 50% from Informatics Engineering (the
student ID begin with 1604), then using the last 4 digit student ID
as the index will likely cause many collisions.

• Choosing an appropriate hash function is referred to as collision


avoidance, but this is more difficult to do.

• Easier way? Collision Resolution


Collision Resolution
• The effort to find another location for a certain item to another
location (because of its actual location was taken) is called
Collision Resolution

• One of the simplest approach is called Linear Probing, and it


works as follows:
1. Find the location of item inserted using Hash Function
2. If the location has not been taken, then insert the item in that
location.
3. If the location has been taken, find the next empty location
sequentially. Once found, store the item in that empty
location.
CASE STUDY (Linear Probing)
ID Nama
A hashtable created to store 61011 Abby
Surabaya school student data 91112 Benny
(Student ID, Name). The hash 91011 Kelly
function is used for the last 2 21013 Donna
digits of the student ID. 52011 Dinna
61013 Moana
11111 Ester
Describe the data of the hashtable 10111 Noel
using linear probing if the 61114 Zach
following student data are inputted 81112 Levi
sequentially into the hashtable. 81110 Jack
71010 Geena
CASE STUDY (Linear Probing)
Note: We assume we use an array with unlimited capacity
Hashtable
ID Nama H(x) Index Nama Note
61011 Abby 11 10 Jack H(x)
91112 Benny 12 11 Abby H(x)
91011 Kelly 11 12 Benny H(x)
21013 Donna 13 13 Kelly H(x) + 2
52011 Dinna 11 14 Donna H(x) + 1
61013 Moana 13 15 Dinna H(x) + 4
11111 Ester 11 16 Moana H(x) + 3
10111 Noel 11 17 Ester H(x) + 6
61114 Zach 14 18 Noel H(x) + 7
81112 Levi 12 19 Zach H(x) + 5
81110 Jack 10 20 Levi H(x) + 8
71010 Geena 10 21 Geena H(x) + 11
Linear Probing
• All of the data above will make a cluster around a certain
location.

• This condition will force us to perform many linear probing, either


in storing or retrieving the data.

• The solution is Quadratic Probing


Quadratic Probing
• For speedy lookups, we have to try to store the data uniformly distributed in
the hash table, not clustered around certain points.

• Instead of finding empty location at the next location as in Linear Probing,


Quadratic Probing will try to find the empty location of a particular item at
the following index:
Attempt 1: idx + 12 (idx is calculated from the hash function)
Attempt 2: idx – 12
Attempt 3: idx + 22
Attempt 4: idx – 22
Attempt 5: and so on.
This formula will be restarted for a new item.
CASE STUDY (Quadratic Probing)
ID Nama
A hashtable created to store
61011 Abby
Surabaya school student data 91112 Benny
(Student ID, Name). The hash 91011 Kelly
function is used for the last 2 21013 Donna
digits of the student ID. 52011 Dinna
61013 Moana
11111 Ester
Describe the data of the hashtable 10111 Noel
using quadratic probing if the 61114 Zach
following student data are inputted 81112 Levi
sequentially into the hashtable. 81110 Jack
71010 Geena
CASE STUDY (Quadratic Probing)
Note: We assume we use an array with unlimited capacity
Hashtable
ID Nama H(x) Index Nama Note
61011 Abby 11 6 Geena H(x) - 22
91112 Benny 12 7 Ester H(x) - 22
91011 Kelly 11 9 Jack H(x) - 12
21013 Donna 13 10 Kelly H(x) - 12
52011 Dinna 11 11 Abby H(x)
61013 Moana 13 12 Benny H(x)
11111 Ester 11 13 Donna H(x)
10111 Noel 11 14 Moana H(x) + 12
61114 Zach 14 15 Dinna H(x) + 22
81112 Levi 12 16 Levi H(x) + 22
81110 Jack 10 18 Zach H(x) + 22
71010 Geena 10 20 Noel H(x) + 32
Quadratic Probing
Although it is better than Linear Probing, Quadratic Probing can still lead to
clustering. What if the array capacity has limitations?
Example
The data are:
• 110114002, 110214002, 110314002 , 110414002 , and 110514002
• 110114003, 110214003, 110314003, 110414008, and 110514005

• Hash Function: get the last 2 digit of student ID


• Array size: 100 (index: 0 – 99)

• Use the “circular” array concept to find the actual new index from
quadratic/linear probing (for example: idx from calculation: 102, actual: 2,
as long as index 2 is empty)
Hashing in .Net
• .Net framework has already implemented the concept of hashing and
provides the hash table through the Hashtable class.

• Adding data to the Hashtable, can be performed by Add method, that will
store the key and the item, which can be created from any data type

Example:

Hashtable students = new Hashtable();


students.Add(“110114002”, “Miftah”);
students.Add(“110214002”, “Vincent”);
Get Data From Hashtable
• Retrieving data from Hashtable can be performed by the square bracket
(“[ ]”) operator.

• The index that must be used inside this square bracket is the key of the
data.

• Remember, because all the data is stored in class object, then you must
perform explicit casting to the data retrieved.

Example:
string name = (string)students[“110214002”];
Get Data From Hashtable
• Before you get the data from Hashtable, it is advisable to check the
existence of the data by ContainsKey method.

Example:

string name = “”;


if (students.ContainsKey(“110214002”)
{
name = (string)students[“110214002”];
}
Get All Data From Hashtable
• The Hashtable also has Keys property that will return a collection of keys in
the Hashtable

• You can use this property to enumerate all the items in the array such as:
foreach (string key in students.Keys)
{
listBox1.Items.Add (“key = ” + key + “ Name = ” + students[key]);
}

Note: the order of the items retrieved may not be the same as the order of the
items in addition
QUESTIONS??

You might also like