Struktur Data: By: Sri Rezeki Candra Nursari

STRUKTUR DATA
By : Sri Rezeki Candra Nursari
2 SKS
Literatur
Sjukani Moh., (2007), Struktur Data (Algoritma &
Struktur Data 2) dengan C, C++, Mitra Wacana
Media
Utami Ema. dkk, (2007),Struktur Data (Konsep &
Implementasinya Dalam Bahasa C & Free Pascal di
GNU/Linux), Graha Ilmu
Hubbard Jhon, R., Ph.D, (2000), Schaums Outline
Of Theory and Problems of Data Structures With
C++ McGraw-Hill
Bambangworawan Paulus., (2004), Struktur Data
Dengan C, Andi Yogyakarta
Materi
1. Data dan Struktur Data
2. Array
3. Struktur dan Record
4. Pointer
5. Linked List
6. Stack (Tumpukan)
7. Queue (Antrian)
8. Tree (Pohon)
9. AVL Tree
10. Heap dan B-Tree
11. Sorting
12. Search
13. Hashing
14. Graph
HASH
Pertemuan 15
2 SKS
Outline
Hashing
Definition
Hash function
Collision resolution
Open hashing
Separate chaining
Closed hashing (Open addressing)
Linear probing
Quadratic probing
Double hashing
Primary Clustering, Secondary Clustering
Access: insert, find, delete
Hash Tables
Hashing is used for storing relatively large
amounts of data in a table called a hash table
ADT.
Hash table is usually fixed as H-size, which is
larger than the amount of data that we want to
store.
We define the load factor () to be the ratio of
data to the size of the hash table.
hash table
item
Hash function keymaps an
hash
item into an 1index in
0
2
range. function 3
H-1
Hash Tables (2)
Hashing is a technique used to perform insertions,
deletions, and finds in constant average time.
To insert or find a certain data, we assign a key to the
elements and use a function to determine the location
of the element within the table called hash function.
Hash tables are arrays of cells with fixed size containing
data or keys corresponding to data.
For each key, we use the hashing function to map key
into some number in the range 0 to H-size-1 using
hashing function.
Hash Function
Hashing function should have the following features:
Easy to compute.
Two distinct key map to two different cells in array (Not
true in general) - why?.
This can be achieved by using direct-address table where
universal set of keys is reasonably small.
Distributes the keys evenly among cells.
One simple hashing function is to use mod function
with a prime number.
Any manipulation of digits, with least complexity and
good distribution can be used.
Hash Function: Truncation
Part of the key is simply ignored, with the
remainder truncated or concatenated to
form the index.
Phone no: index
731-3018 338
539-2309 329
428-1397 217
Hash Function: Folding
The data can be split up into smaller chunks
which are then folded together in some form.
Phone no: 3-group index
7313018 73+13+018 104
5392309 53+92+309 454
4281397 42+81+397 520
Hash Function: Modular arithmetic
Convert the data into an integer, divide by the
size of the hash table, and take the remainder
as the index.
3-group index
731+3018 3749 % 100 = 49
539+2309 2848 % 100 = 48
428+1397 1825 % 100 = 25
Choosing a hash function
A good has function should satisfy two
criteria:
1. It should be quick to compute
2. It should minimize the number of collisions
Example of hash function
Hash function for string
X = 128
A3 X3 + A2 X2 + A1 X1 + A0 X0
(((A3 X) + A2) X + A1) X + A0
The result of hash function is much larger
than the size of table, so we should
modulo the result with the size of hash
table.
int hash(String key, int tableSize)
{
int hashVal = 0;
for (int i=0; i < key.length(); i++)
hashVal = (hashVal * 128 + key.charAt(i)) % tableSize;
return hashVal % tableSize;
}
Modulo
(A + B) % C = (A % C + B % C) % C
(A * B) % C = (A % C * B % C) % C
{
int hashVal = 0;
for (int i=0; i < key.length();
i++)
hashVal = (hashVal*37+ key.charAt(i));
hashVal %= tableSize;
if (hashVal < 0)
hashVal += tableSize;
return hashVal;
}
{
int hashVal = 0;
for (int i=0; i < key.length();
i++)
hashVal += key.charAt(i)
return hashVal % tableSize;

}
When two keys map into the same cell,
we get a collision.
We may have collision in insertion, and
need to set a procedure (collision
resolution) to resolve it.
Closed Hashing
If collision, try to find alternative cells within table.
Closed hashing also known as open addressing.
For insertion, we try cells in sequence by using
incremented function like:
hi(x) = (hash(x) + f(i)) mod H-size f(0) = 0
Function f is used as collision resolution strategy.
The table is bigger than the number of data.
Different method to choose function f :
Linear probing
Quadratic probing
Double hashing
Linear probing
Use a linear function f(i) = i
Find the first position in the table for the key,
which is close to the actual position.
Least complex function.
May result in primary clustering.
Elements that hash to the different location probe the
same alternative cells
The complexity of this probing is dependent on
the value of (load factor).
We do not use this probing if > 0.5.
Hashing - insert
0 alpha
1
2 crystal
3 dawn
4 emerald
5 flamingo
6
7 hallmark
8
9
10
11
12 moon marigold
13
14
15
..
.
Hashing - lookup
0 alpha
1
2 crystal cobalt?
3 dawn
4 emerald
5 flamingo
6
7 hallmark
8
9
10
11
12 moon marigold?
13 marigold
14
15 private private?
..
.
Hashing - delete
lazy deletion
0 - why?
alpha
1
2 crystal
3 dawn
4 delete emerald
5 flamingo
6
7 hallmark
8
9
10
11
12 delete moon
13 marigold
14
15 private
..
.
Hashing - operation after delete
0 alpha
1
2 crystal custom (insert)
3 dawn
4
5 flamingo
6
7 hallmark
8
9
10
11
12 marigold?
13 marigold
14
15 private
..
.
Primary Clustering
Elements that hash to the different location
probe the same alternative cells alpha
alpha
cobalt crystal canary crystal canary
dawn dark dawn
custom custom
flamingo flamingo
hallmark hallmark
marigold marigold
private private
.. ..
. .
Quadratic probing
Eliminate the primary clustering by selecting f(i) = i2
There is more problem with a hash table that is more
than half full.
You have to select appropriate table size that is not
square of a number.
We can prove that quadratic probing with table size
prime number and at least half empty will always find a
location for an element.
Can use increment to collision by noting that quadratic
function f(i) = i2 = f(i-1) + 2 i - 1.
Elements that hash to the same location will probe the
same alternative cells (secondary clustering).
Double hashing
Collision resolution function is another hash
function like f(i) = i * hash2 (x)
Each time a factor of hash2 (x) is added to
probe.
Have to be careful for the choice of second
hash function to ensure that it does not come
to zero and it probes all the cells.
It is essential to have a prime size hash table.
Double Hashing
alpha alpha
cobalt crystal canary crystal
dawn dark dawn done
custom custom
flamingo flamingo
hallmark hallmark
marigold marigold
private private
.. ..
. .
Open Hashing
Collision problems is solved by inserting all elements that
hash to the same bucket into a single collection of values.
Open Hashing:
To keep a linked list of all the elements that are hashed to the
same cell (separate chaining).
Each cell in the hash table contains a pointer to a linked list
containing the data.
Functions and Analysis of Open Hashing:
Inserting a new element in to the table: We add the element at
the beginning or the end of the appropriate linked list.
Depending if you want to check for duplicates or not.
Also depends on how frequent you expect to access the most
recently added elements.
Open Hashing
5
Open Hashing
For search, we use the hash function to determine
which linked list holds the element, and then traverse
the linked list to find the element.
Deletion is done to the element in the appropriate
linked list after we find the element to be deleted.
We could use other kinds of lists like a tree or another
hash table for each cell in the hash table to resolve
collision.
The main advantage of this method is the fact that it
can handle any amount of data (dynamic expansion).
The main disadvantage of this method is the memory
usage for each cell.
Analysis of Open Hash
In general the average length of a list is the load factor .
Complexity of insertion depends on hashing function and
where insertion is done but in general has the same
complexity of insertion to the linked list + time to evaluate
the hashing function used.
For search, time complexity is the constant time to evaluate
the hashing function + traversing the list.
Worst case O(n) for search.
Average case depends .
General rule for open hashing is to make 1.
Used for dynamic size data.
Issues
Other issues common to all closed hashing
resolutions:
Confusing after deletion.
Simpler than open hashing function
Good if we do not expect too many collisions.
If search is unsuccessful, we may have to search
the whole table.
Use of large table compare to number of data
expected.
Summary
Hash tables: array
Hash function: function that maps key into
number [0 size of hash table)
Open hashing
Separate chaining
Closed hashing (Open addressing)
Linear probing
Quadratic probing
Double hashing
Primary Clustering, Secondary Clustering
Summary
Advantage
Running time
O(1) + O(Collision resolution)
Disadvantage
Difficult (not efficient) to print all elements in hash
table
Inefficient to find minimum element or maximum
element
Not growable (for closed hash/open addressing)
Waste some space (load factor)

Struktur Data: By: Sri Rezeki Candra Nursari

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Struktur Data: By: Sri Rezeki Candra Nursari

Uploaded by

Copyright:

Available Formats

STRUKTUR DATA

By : Sri Rezeki Candra Nursari

return hashVal % tableSize;

You might also like