Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

Hashing Strategies

1. Introduction to Hashing:
 Hashing is a technique used to map data of arbitrary size to fixed-size values,
typically integers, known as hash codes.
 The purpose of hashing in data structures like sets and dictionaries is to enable
efficient access to elements or keys.
 In an ideal scenario, each key would map to a unique position in an underlying array,
allowing for constant-time access.
2. Key-to-Address Transformation:
 Hashing involves applying a function to a key to determine its position or address in
the underlying array.
 For example, if the first key is 15,000 and subsequent keys are consecutive, the
position of a given key could be computed using the expression key - 15000.
 This transformation process is crucial for achieving efficient access to elements based
on their keys.
3. Hash Function:
 The function responsible for transforming keys into array positions is called a hash
function.
 A good hash function should distribute keys evenly across the array to minimize
collisions, where multiple keys map to the same position.
 Ideally, a hash function should execute in constant time to ensure that insertions,
accesses, and removals have O(1) complexity.
4. Hash Table:
 The underlying array used with a hashing strategy is called a hash table.
 Each position in the hash table corresponds to a possible address computed by the
hash function.
 Hash tables are typically large enough to accommodate potential keys and efficiently
handle collisions.
5. Efficiency of Hashing:
 When implemented properly, hashing allows for fast retrieval of elements or keys
from data structures like sets and dictionaries.
 The efficiency of hashing depends on the quality of the hash function and the
management of collisions.
 With a good hash function and appropriate collision resolution techniques,
insertions, accesses, and removals can achieve constant-time complexity, providing
excellent performance in practice.

In summary, hashing is a fundamental technique for achieving efficient access to data in sets and
dictionaries. By transforming keys into array positions using hash functions and utilizing hash tables,
hashing strategies enable fast and scalable data retrieval operations.
FISRT EXAMPLE DIAGRAM

In Figure 11-4, the placement of keys 3, 5, 8, and 10 using the hashing function key % 4
is illustrated. Let's break down what this means:

1. Hashing Function:
 The hashing function used here is key % 4, which computes the remainder
of dividing the key by 4.
 This function ensures that each key is mapped to one of four possible
positions in the array, represented by indices 0, 1, 2, and 3.
2. Keys and Positions:
 Each key (3, 5, 8, and 10) is processed by the hashing function to
determine its position in the array.
 The result of applying the hashing function to each key is as follows:
 Key 3: 3 % 4 = 3 (position 3)
 Key 5: 5 % 4 = 1 (position 1)
 Key 8: 8 % 4 = 0 (position 0)
 Key 10: 10 % 4 = 2 (position 2)
3. Placement:
 Each key is placed at its corresponding position in the array, based on the
result of the hashing function.
 Key 3 is placed at index 3, key 5 at index 1, key 8 at index 0, and key 10 at
index 2.
4. Distribution:
 The hashing function aims to evenly distribute keys across the array to
minimize collisions and ensure efficient access.
 In this example, the keys are distributed across the array positions 0, 1, 2,
and 3, ensuring a relatively balanced distribution.

In summary, the hashing function key % 4 is used to map keys to array positions,
resulting in the placement of keys 3, 5, 8, and 10 at specific indices in the array. This
technique facilitates efficient access to keys in data structures like sets and dictionaries.
SECOND DIAGRAM EXPLANATION

In Figure 11-5, the placement of keys 3, 4, 8, and 10 using the hashing function key % 4
is illustrated. Let's break down what this means:

1. Hashing Function:
 The hashing function used here is key % 4, which computes the remainder
of dividing the key by 4.
 This function aims to map each key to one of four possible positions in the
array, represented by indices 0, 1, 2, and 3.
2. Keys and Positions:
 Each key (3, 4, 8, and 10) is processed by the hashing function to
determine its position in the array.
 The result of applying the hashing function to each key is as follows:
 Key 3: 3 % 4 = 3 (position 3)
 Key 4: 4 % 4 = 0 (position 0)
 Key 8: 8 % 4 = 0 (position 0)
 Key 10: 10 % 4 = 2 (position 2)
3. Placement:
 Each key is placed at its corresponding position in the array, based on the
result of the hashing function.
 Key 3 is placed at index 3, key 4 and key 8 both hash to index 0, and key
10 is placed at index 2.
4. Collision:
 The hashing of keys 4 and 8 to the same index (index 0) results in a
collision.
 A collision occurs when two or more keys are hashed to the same index in
the array.
5. Consequences:
 Collisions can lead to inefficiencies in data retrieval and storage, as they
require additional handling to resolve.
 In the context of hashing, strategies need to be employed to minimize
collisions and maintain efficient access to items in unordered collections.

In summary, while hashing functions aim to evenly distribute keys across the array to
minimize collisions, occurrences of collisions, as seen with keys 4 and 8 in this example,
necessitate the implementation of collision resolution techniques to ensure efficient
data storage and retrieval.

The Relationship of Collisions to Density


Let's break down the concept and the provided code:

1. Relationship of Collisions to Density:


 The density of an array refers to the ratio of occupied cells (containing
data) to the total number of cells in the array.
 In hashing, collisions occur when multiple keys are hashed to the same
index in the array. The likelihood of collisions increases as the density of
the array increases, approaching 1 when the array is full.
 Therefore, collisions are more likely to occur when the array becomes full,
i.e., when there are fewer extra cells available for data storage.

1. keysToIndexes Function:
 This function takes two parameters: keys, which is a list of keys, and n,
which is the length of the array.
 It uses the map function along with a lambda function to apply the hashing
function key % n to each key in the list.
 The result is a list of indexes corresponding to the keys in the array.

Sure, let's dive into the code in detail:

1. Function Definition:
 The keysToIndexes function is defined with two parameters: keys and n.
 keys is a list of positive integers representing keys.
 n is the length of the array.
2. Mapping Function:
 Inside the function, map is used to apply a function to each element of the
keys list.
 The function being applied is a lambda function: lambda key: key % n .
3.
 This lambda function calculates the index in the array for each key by
taking the remainder of the key divided by the length of the array ( key %
n).
 So, for each key in the keys list, this lambda function computes the index
where the key should be placed in the array.
4. Conversion to List:
 The result of the map function is initially an iterator, so it is converted to a
list using list() to obtain a list of indexes corresponding to the keys.
5. Session Output:
 The session demonstrates the function's behavior for two different sets of
keys and array lengths.
6. The code includes sample function calls to demonstrate the behavior of
keysToIndexes.
7. Each function call passes a list of keys ( [3, 5, 8, 10] or [3, 4, 8, 10]) and an array length
(4 or 8) as arguments to keysToIndexes.
8. The output of each function call shows the resulting list of indexes for the given
keys and array length.

For each set of keys, the function calculates the corresponding indexes
using the hashing function key % n.
9. Interpretation of Session Output:
 In the first session output ( keysToIndexes([3, 5, 8, 10], 4)), no collisions occur
because each key maps to a unique index in the array of length 4.
 In the second session output ( keysToIndexes([3, 4, 8, 10], 4)), there is one
collision because keys 4 and 8 both hash to the same index.
 Increasing the array length to 8 ( keysToIndexes([3, 5, 8, 10], 8) and
keysToIndexes([3, 4, 8, 10], 8)) eliminates collisions as there are more available
slots in the array.

Overall, the keysToIndexes function demonstrates how a simple hashing function can be
used to map keys to array indexes, with the session output illustrating the effect of array
length on collision occurrence.

Overall, the keysToIndexes function is a simple utility function used to demonstrate the effect of
hash collisions based on array length. It helps understand how keys are mapped to indexes in a
hash table using a simple hashing function.
Hashing with Nonnumeric Keys
let's break down the content and explanation step by step:

1. Introduction:
 The passage discusses the challenge of generating integer keys for non-
numeric data, such as strings.
 It highlights the limitation of using simple methods like summing ASCII
values, as it may produce the same keys for anagrams and be biased
towards certain letters.
2. Approach for Generating Keys from Strings:
 To address the limitations mentioned, a more sophisticated approach is
proposed.
 The goal is to obtain unique integer keys for each unique string.
 One proposed method involves adjusting the sum of ASCII values based
on the length of the string and characteristics of the first and last
characters.
3. Key Generation Function: stringHash:
 The function stringHash is defined to implement the proposed approach for
generating integer keys from strings.
 It takes a string as input and returns an integer key.
4. Algorithm Explanation:
 If the length of the string exceeds a certain threshold, the first character is
dropped before computing the sum of ASCII values.
 Additionally, if the string is longer than another threshold, the ASCII value
of the last character is subtracted from the sum.
 These adjustments aim to reduce biases introduced by the first and last
characters and mitigate the effect of anagrams on key generation.

Overall, the passage provides insight into the challenges of hashing non-numeric keys,
proposes a solution to improve key generation for strings, and outlines the stringHash
function to implement the proposed approach. This approach aims to generate more
evenly distributed integer keys while maintaining uniqueness for different strings.

CODE
1. Definition of stringHash function:
 The stringHash function is defined to generate an integer key from a string.
 It adjusts the sum of ASCII values based on the length of the string and
characteristics of the first and last characters, as discussed earlier.
 If the length of the string exceeds a certain threshold and the first
character is alphabetic, it drops the first character.
 It computes the sum of ASCII values for the remaining characters in the
string.
 If the length of the string is greater than 2, it subtracts twice the ASCII
value of the last character from the total sum.
2. Demonstration with Anagrams:
 The function stringHash is tested with two anagram strings: "cinema" and
"iceman".
 Despite being anagrams, the generated integer keys for these strings are
different due to the adjustments made in the stringHash function.
 This demonstrates how the proposed hashing function handles anagrams
differently to ensure distinct integer keys.
3. Enhancing keysToIndexes function:
 The keysToIndexes function is updated to accept a hashing function as an
optional third argument.
 If provided, this hashing function is applied to each key before computing
the array index.
 The default hashing function simply returns the key itself, which is suitable
for integer keys.
4. Testing with Different Key Types:
 The keysToIndexes function is tested with lists of integer keys and strings.
 For strings, the stringHash function is used as the hashing function.
 The output demonstrates how the hashing function affects the distribution
of keys in the array indexes, handling collisions and ensuring keys are
evenly distributed.

Overall, this example illustrates how to implement and use a custom hashing function
for generating integer keys from strings, addressing issues such as anagrams and biased
key distributions. It also demonstrates how to integrate this hashing function into the
keysToIndexes function for handling different types of keys.

CODE
1. Definition of stringHash function:
 This function is designed to generate an integer key from a string.
 It accepts a single argument item, which represents the input string.
 The function first checks if the length of the string is greater than 4
characters and if the first character is alphabetic (either lowercase or
uppercase). If both conditions are met, it drops the first character from the
string.
 Next, it initializes a variable total to store the sum of ASCII values of the
characters in the string.
 Then, it iterates over each character ch in the modified string item and adds
the ASCII value of each character to the total.
 If the length of the string is greater than 2 characters, it subtracts twice the
ASCII value of the last character from the total.
 Finally, it returns the computed total as the integer key for the input string.
2. Demonstration with Anagrams:
 Two strings, "cinema" and "iceman", are passed to the stringHash function
to demonstrate how it handles anagrams.
 Despite "cinema" and "iceman" being anagrams, the resulting integer keys
(328 and 296, respectively) are different due to the adjustments made in
the stringHash function. This ensures that anagrams are hashed to distinct
keys.
3. Enhancement of keysToIndexes function:
 The keysToIndexes function is updated to accept an additional argument
hashFunc, which represents the hashing function to be applied to each key.
 Inside the function, if a hashFunc is provided, it is applied to each key
before computing the array index. Otherwise, the default behavior is to
use the key itself as the index.
 The computed array indexes based on the hashed keys are returned as a
list.
4. Testing with Different Key Types:
 The keysToIndexes function is tested with both lists of integer keys and
strings.
 For the string keys, the stringHash function is passed as the hashFunc
argument.
 The output demonstrates how the hashing function affects the distribution
of keys in the array indexes, particularly handling collisions and ensuring
keys are evenly distributed.

In summary, the code showcases how to implement a custom hashing function for
strings to generate integer keys, address issues like anagrams, and integrate this
function into a generalized function for mapping keys to array indexes.
Certainly! Let's break down the paragraph and the provided code:

1. Standard Hash Function:


 Python provides a built-in hash function that can generate a unique integer
hash value for any Python object.
 The hash function takes any Python object as an argument and returns an
integer hash value.
 It's important to note that the integer returned by hash might be negative,
so taking its absolute value ensures a non-negative integer.
 This integer can then be used in hashing applications, typically by applying
the remainder operator to compute an index within a fixed-size array or
hash table.
2. Comparison of hash with stringHash:
 The code snippet compares the results of using Python's built-in hash
function with the custom stringHash function for the strings "cinema" and
"iceman".
 For "cinema" and "iceman", the hash function produces hash values
[1338503047, 1166902005], while the stringHash function produces integer keys
[328, 296].
 The differences in the generated hash values and integer keys
demonstrate the distinct hashing behaviors of the two functions.
3. Usage in keysToIndexes function:
 The keysToIndexes function is tested with both the hash function and the
custom stringHash function as the hashing method.
 When using hash as the hashing function, the resulting array indexes for
the keys "cinema" and "iceman" are [1, 0].
 However, when using stringHash, the indexes are [1, 2], demonstrating how
different hashing functions can lead to different index distributions,
especially in the case of collisions.
4. Sophisticated Hashing Functions:
 The paragraph acknowledges that more sophisticated hashing functions
exist, but they are typically the subject of advanced courses and are not
covered in this book.
 For the purposes of the chapter, Python's built-in hash function and the
remainder method (using %) are used for simplicity.

In summary, the paragraph and the provided code illustrate the use of Python's built-in
hash function for generating hash values, compare its behavior with a custom hashing
function ( stringHash), and demonstrate its usage in the context of mapping keys to array
indexes ( keysToIndexes). It emphasizes the differences in hashing behavior and index
distributions between different hashing functions.

LINEAR PROBING
Linear probing is a technique used to resolve collisions in hash tables. When two or
more keys hash to the same index in the hash table (a collision), linear probing helps
find an alternative index to store the conflicting key-value pair.

Here's how linear probing works:

1. Hashing:
 Each key is hashed to produce an index in the hash table where the
corresponding value will be stored. Hashing is the process of converting a
key into an integer index.
2. Collision:
 When multiple keys hash to the same index, a collision occurs. This means
that the position in the hash table is already occupied by another key-
value pair.
3. Resolution:
 Linear probing provides a simple resolution strategy for collisions. Instead
of immediately storing the key-value pair at the hashed index, linear
probing checks subsequent positions in the hash table until it finds an
empty slot.
4. Probing:
 Linear probing involves probing or searching for the next available
position in a linear manner. Starting from the hashed index, linear probing
checks each successive position until an empty slot is found.
5. Wraparound:
 If linear probing reaches the end of the hash table without finding an
empty slot, it wraps around to the beginning of the table and continues
the search until it finds an available position. This ensures that the entire
hash table is searched for an empty slot.

Now, let's dissect the provided paragraph using the understanding of linear probing:

"For insertions, the simplest way to resolve a collision is to search the array, starting
from the collision spot, for the first available position; this process is referred to as linear
probing."

 This sentence describes linear probing as a strategy used during insertions to


handle collisions. When a collision occurs (i.e., when the desired position for
insertion is already occupied), linear probing starts searching from the collided
spot and moves sequentially through the array until it finds the first available
position.

"Each position in the array is in one of three distinguishable states: occupied, never
occupied, or previously occupied."

 It mentions the three states that a position in the array can be in: occupied
(contains a key-value pair), never occupied (empty), or previously occupied (was
occupied before but now vacant, usually marked as DELETED).

"A position is considered to be available for the insertion of a key if it has never been
occupied or if a key has been deleted from it (previously occupied)."

 This explains when a position is considered available for insertion: either it has
never been occupied, or it was previously occupied but is now vacant due to
deletion.

"The values EMPTY and DELETED designate these two states, respectively."
 It clarifies that the states of never occupied and previously occupied are
represented by special values EMPTY and DELETED, respectively, in the array.

"At start-up, the array cells are filled with the EMPTY value. The value of a cell is set to
DELETED when a key is removed."

 This describes the initialization of the hash table: all positions are initially marked
as EMPTY, and when a key is deleted, the corresponding position is marked as
DELETED.

"At the start of an insertion, the hashing function is run to compute the home index of
the item."

 It explains that during insertion, the hashing function is used to determine the
ideal or "home" index for the item in the hash table.

"The home index is the position where the item should go if the hash function works
perfectly (this position will be unoccupied in this case)."

 It defines the home index as the ideal position for the item if there were no
collisions, i.e., if the hash function perfectly distributed keys across the hash table.

"If the cell at the home index is not available, the algorithm moves the index to the right
to probe for an available cell."

 This describes the process of linear probing: if the home index is already
occupied, the algorithm checks subsequent positions (moving to the right in the
array) until it finds an available cell.

"When the search reaches the last position of the array, the probing wraps around to
continue from the first position."

 It mentions the wraparound behavior of linear probing: if the search reaches the
end of the array without finding an available cell, it wraps around to the
beginning and continues the search until an available cell is found.
PARAGRAPGH NEXT
1. Insertions:
 The paragraph explains the process of inserting a new item into the hash
table. It starts by determining the home index for the item using the hash
function.
 If the cell at the home index is not available (i.e., it's already occupied),
linear probing is employed to find the next available cell.
 The code provided implements this process. It initializes the index to the
home index and then enters a loop that continues until an empty cell is
found.
 Inside the loop, the code checks if the current cell is not EMPTY or
DELETED. If it's not, the index is incremented (with wraparound) to probe
the next cell.
 Once an empty cell is found, the item is stored in that cell.
2. Retrievals:
 Retrievals involve searching for a target item in the hash table. The
probing process starts at the home index determined by the hash function.
 The search continues until an empty cell is encountered or the target item
is found. This allows the algorithm to traverse both previously occupied
and currently occupied cells.
 The paragraph explains this concept briefly, and the retrieval process is
similar to insertion, but instead of inserting an item, the algorithm is
searching for it.
3. Removals:
 Removals also involve probing through the hash table, similar to retrievals.
 If the target item is found during probing, its cell is marked as DELETED to
indicate that it was previously occupied but is now vacant.
 The code for removals would follow a similar logic to retrievals, but when
the target item is found, the corresponding cell would be marked as
DELETED.
4. Implementation Details:
 The provided code snippet is a concise representation of how linear
probing can be implemented for insertions in a hash table.
 It demonstrates the use of a while loop to continuously probe through the
hash table until an empty cell is found.
 The modulo operation ( % len(table)) ensures that the index wraps around to
the beginning of the array if it reaches the end.
 The code efficiently handles collisions by searching for the next available
cell and storing the item there.
5. Handling State Changes:
 The paragraph mentions the use of special values EMPTY and DELETED to
represent different states of cells in the hash table.
 When a cell is marked as DELETED, it indicates that it was previously
occupied but is now vacant. This helps in maintaining the integrity of the
hash table during removals.

Overall, the combination of explanation and code provides a comprehensive


understanding of how linear probing is used to handle insertions, retrievals, and
removals in a hash table, ensuring efficient storage and retrieval of items even in the
presence of collisions.
# Get the home index
index = abs(hash(item)) % len(table)
# Stop searching when an empty cell is encountered
while not table[index] in (EMPTY, DELETED):
# Increment the index and wrap around to first
# position if necessary
index = (index + 1) % len(table)
# An empty cell is found, so store the item
table[index] = item

1. Home Index Calculation:


 The code starts by calculating the home index for the given item using the
hash function.
 The hash() function is applied to the item to generate its hash value.
 Since hash values can be negative, abs() is used to ensure a non-negative
index.
 The modulo operator % is then applied with the length of the table
(len(table)) to ensure that the index falls within the bounds of the table.
2. Linear Probing Loop:
 The code enters a while loop to search for an empty cell in the hash table.
 The loop continues until it finds a cell that is either EMPTY or DELETED,
indicating it is available for insertion.
 The condition not table[index] in (EMPTY, DELETED) checks if the current cell
is not EMPTY or DELETED. If it's not, the loop continues.
3. Probing and Wrapping:
 Inside the loop, the index is incremented to probe the next cell in the hash
table.
 The increment operation (index + 1) moves to the next cell.
 The modulo operation (index + 1) % len(table) ensures that if the index
reaches the end of the table, it wraps around to the beginning,
maintaining the circular behavior of linear probing.
4. Insertion:
 Once an empty cell is found, the loop exits, and the item is stored in that
cell.
 The line table[index] = item assigns the item to the table at the index where
an empty cell was found.

This code effectively handles the insertion of an item into a hash table using linear
probing. It ensures that the item is inserted in the first available cell after probing
through the hash table. The probing process continues until it finds an empty cell, and
the use of modulo ensures that the search wraps around to the beginning if needed.
Overall, this implementation guarantees efficient insertion even in the presence of
collisions.

1. Handling Deletions:
 The paragraph highlights a common issue with linear probing: after several
insertions and removals, deleted cells may accumulate between an item
and its home index.
 This situation increases the distance between the item and its home index,
leading to longer access times.
 To address this, one approach is to shift items on the right side of the
deleted cell to the left until reaching an empty cell, a currently occupied
cell, or the home indexes of each item.
 This shifting process helps close the gaps left by removed items and
reduces the distance between items and their home indexes.
2. Regular Rehashing:
 Another strategy to mitigate the impact of deletions and clustering is
regular rehashing of the hash table.
 Rehashing involves reconstructing the hash table by resizing it or
redistributing its contents when certain conditions are met, such as when
the load factor (ratio of items to table size) exceeds a certain threshold
(e.g., 0.5).
 When rehashing, previously occupied cells are either marked as currently
occupied or emptied, effectively removing any clusters that may have
formed.
 If the hash table has information about the frequency of access to items,
reinserting items in decreasing order of frequency during rehashing can
further optimize access times by placing frequently accessed items closer
to their home indexes.
3. Preference for Rehashing:
 The paragraph suggests that the second strategy of regular rehashing may
be preferred because rehashing is already necessary when the array
becomes full or when its load factor exceeds an acceptable limit.
 By incorporating rehashing into the regular maintenance of the hash table,
both the issues of deletions and clustering can be effectively addressed.
4. Clustering:
 The paragraph also mentions another issue with linear probing called
clustering.
 Clustering occurs when items that cause collisions are repeatedly relocated
to the same region (cluster) within the array.
 The example provided illustrates clustering after several insertions of keys
(20, 30, 40, 50, 60, 70) in the hash table, where a cluster forms at the
bottom of the array.
 Clustering can lead to degraded performance as it increases the likelihood
of further collisions, exacerbating the issues associated with linear probing.

Overall, the paragraph discusses strategies to mitigate the issues of deletions and
clustering in hash tables using linear probing. These strategies aim to improve the
efficiency and performance of hash table operations by reducing the impact of collisions
and optimizing access times. Regular rehashing is particularly emphasized as a proactive
approach to maintain the integrity and performance of the hash table over time.

QUADRATIC PROBING
quadratic probing, a technique used to resolve collisions in hash tables, step by step:

1. Objective:
 The main goal of quadratic probing is to avoid clustering, which is a
common issue associated with linear probing.
 Clustering occurs when consecutive collisions cause items to be inserted
into nearby locations in the hash table, leading to inefficient access times.
2. Incrementing the Search Distance:
 In quadratic probing, instead of incrementing the index linearly as in linear
probing, the index is incremented quadratically.
 This means that on each attempt to find an empty position, the distance
from the home index is incremented by the square of a distance.
3. Formula:
 The formula used to calculate the probe index on each attempt is:
diff
Copy code
index = (home_index + d^2) % table_size
 Here, home_index is the initial index calculated by the hash function
for the item.
 d represents the distance from the home index.
 The % table_size ensures that the index wraps around to the
beginning if it exceeds the size of the hash table.
4. Incrementing the Distance:
 If the initial attempt to find an empty position fails, the distance d is
incremented, and the probing process continues.
 The distance d is typically incremented by 1 on each attempt, but it could
be incremented by any other suitable value depending on the
implementation.
5. Example:
 Let's say we have a home index k and an initial distance d.
 On each attempt, the probe index is calculated using the formula index = (k
+ d^2) % table_size .
 If probing is necessary, the probe starts at the home index plus 1, and then
moves distances of 4 (2^2), 9 (3^2), 16 (4^2), and so on from the home
index.
6. Benefits:
 Quadratic probing helps distribute collided items more evenly throughout
the hash table compared to linear probing.
 By incrementing the distance quadratically, quadratic probing ensures that
items are spread out over a larger area of the hash table, reducing the
likelihood of clustering and improving access times.

Overall, quadratic probing provides an alternative to linear probing for resolving


collisions in hash tables. By incrementing the probe distance quadratically, it helps
distribute items more evenly and reduces the impact of clustering, thereby improving
the efficiency of hash table operations.
Code
1. Initialization:
 The code starts by calculating the hash value of the item and taking its
absolute value to ensure it's non-negative.
 It then sets the initial distance to 1 and calculates the home index for the
item using the modulo operator % with the length of the table.
2. Quadratic Probing Loop:
 The code enters a while loop to search for an unoccupied cell in the hash
table.
 The loop continues until it finds a cell that is either EMPTY or DELETED,
indicating it is available for insertion.
 Inside the loop, the index is calculated using quadratic probing:
 (homeIndex + distance ** 2) % len(table) calculates the index based on
the home index and the square of the distance.
 The square of the distance ensures that the probing process jumps
over cells, avoiding the linear progression seen in linear probing.
 The modulo operation % len(table) ensures that the index wraps
around to the beginning if it exceeds the size of the table.
3. Incrementing Distance:
 After calculating the index, the distance is incremented by 1 for the next
iteration.
 This ensures that the index is incremented quadratically on each attempt
to find an empty cell.
4. Insertion:
 Once an empty cell is found, the loop exits, and the item is stored in that
cell.
 The line table[index] = item assigns the item to the table at the calculated
index.

Now, regarding the concern mentioned in the paragraph:

 The paragraph points out that quadratic probing may cause the algorithm to skip
over some cells, potentially missing opportunities to insert items.
 This can lead to wasted space in the hash table, as some cells may remain
unoccupied even though there are available slots.
 This issue arises because quadratic probing increments the distance quadratically,
which can cause the algorithm to overshoot some positions, especially if the table
is densely populated.

Overall, while quadratic probing helps mitigate clustering compared to linear probing, it
may still lead to inefficiencies due to potentially missed cells. Therefore, it's important to
carefully consider the trade-offs between space efficiency and collision resolution
strategies when implementing quadratic probing in hash tables.

Chaining
1. Overview:
 Chaining is a collision-resolution technique where items that hash to the
same index are stored in a linked list associated with that index.
 Instead of storing items directly in the hash table array, each slot in the
array contains a reference to the head of a linked list.
 When a collision occurs, items are appended to the linked list at the
corresponding index, creating a chain of items.
2. Storage Structure:
 In chaining, the hash table is typically implemented as an array of linked
lists (or sometimes other types of dynamic data structures like trees).
 Each element in the array corresponds to a bucket or index, and it holds
the head pointer of the linked list associated with that bucket.
3. Insertion:
 When inserting a new item into the hash table:
1. Compute the hash value of the item's key to determine its home
index in the array.
2. Append the item to the linked list at that index.
3. If the item's key already exists in the linked list, it may be replaced
or updated, depending on the specific requirements of the
application.
4. Retrieval and Removal:
 Retrieval and removal operations involve searching for the target item in
the linked list at its home index:
1. Compute the hash value of the item's key to determine its home
index in the array.
2. Traverse the linked list at that index, searching for the item.
3. If the item is found, it can be returned (for retrieval) or removed
from the linked list (for removal).
5. Advantages:
 Chaining provides a simple and efficient solution to handle collisions.
 It allows for an unlimited number of items to hash to the same index, as
they can all be stored in the same linked list.
Chaining also maintains a good level of performance even as the load

factor of the hash table increases.
6. Example:
 Consider an array of linked lists with five buckets and eight items.
 Each bucket in the array contains the head of a linked list, and items that
hash to the same index are stored in the corresponding linked list.
 Retrieval and removal operations involve traversing the linked list at the
target index to find the desired item.

Overall, chaining is a versatile and effective collision-resolution strategy that allows hash
tables to efficiently handle collisions by storing colliding items in separate chains. It
ensures that the performance of the hash table remains stable even under high load
factors.

CODE
Let's break down the code for inserting an item using chaining in a hash table:

python
Copy code
# Get the home index index = abs ( hash (item)) % len (table) # Access a bucket and store the item at the head # of
its linked list table[index] = Node(item, table[index])
1. Home Index Calculation:
 The code starts by computing the home index for the item in the hash
table.
 It calculates the hash value of the item using the hash() function and takes
the absolute value to ensure a non-negative index.
 The modulo operator % is then applied with the length of the table
(len(table)) to ensure that the index falls within the bounds of the table.
2. Insertion:
 Once the home index is computed, the code accesses the bucket (array
cell) at that index in the hash table.
 It then creates a new Node object with the item and links it to the head of
the linked list at the bucket's location.
 If there are no items at the bucket yet (i.e., it's empty), the new Node
becomes the head of the linked list at that location.
 If there are existing items at the bucket, the new Node is inserted at the
head of the linked list, effectively chaining it to the existing items.
3. Node Class:
 The Node class is a data structure used to represent each item in the linked
list.
 It typically contains two attributes: data to store the item, and next to store
the reference to the next node in the linked list.
 In the provided code, the Node() function is used to create a new node,
passing the item and the current head of the linked list as arguments.

Overall, this code efficiently inserts an item into the hash table using chaining. By
computing the home index and accessing the appropriate bucket, it ensures that the
item is added to the head of the linked list at the correct location in the hash table. This
approach facilitates efficient retrieval and removal operations, making chaining a robust
collision resolution strategy in hash tables.

EXERCISE
Let's tackle these exercises one by one:

1. How hashing provides constant-time access to a data structure:


 Hashing allows for constant-time access to a data structure by using a
hash function to map keys to indices in an array.
 When an item needs to be accessed or stored, its key is hashed to
determine its index in the array.
 Since array access is typically constant-time (O(1)), the time complexity of
accessing or storing an item becomes O(1) as well.
 However, in practice, the efficiency of hashing depends on the quality of
the hash function and the handling of collisions.
2. Home index:
 The home index, also known as the hash value or hash code, is the index in
the array where an item is ideally supposed to be stored based on its key.
 It is calculated by applying a hash function to the item's key, which
transforms the key into a unique or nearly unique numerical value.
 The home index determines the initial position in the hash table where an
item should be placed to achieve efficient retrieval.
3. Causes of collisions:
 Collisions occur when two or more items have the same hash value (home
index) but different keys.
 Common causes of collisions include:
 Imperfect hash functions that generate the same hash value for
different keys.
 Limited range of hash values compared to the number of possible
keys.
 Collisions are inherent in hash functions, and effective collision resolution
strategies are needed to handle them.
4. Linear method of resolving collisions:
 The linear method, also known as linear probing, resolves collisions by
sequentially searching for the next available slot in the hash table.
 When a collision occurs, the algorithm probes the next position in the
table until it finds an empty slot.
 This probing process continues linearly until an empty slot is found,
effectively resolving the collision.
 However, linear probing can lead to clustering, where consecutive
collisions cause items to be stored close to each other, potentially
affecting the efficiency of retrieval.
5. Causes of clustering:
 Clustering occurs when multiple collisions cause items to be stored in
close proximity to each other within the hash table.
 It typically arises in collision resolution strategies like linear probing, where
consecutive collisions result in items being placed adjacent to each other.
 Clustering can lead to increased search times and reduced performance of
the hash table.
6. Quadratic method of resolving collisions and mitigation of clustering:
 The quadratic method, or quadratic probing, resolves collisions by
incrementing the probe distance quadratically instead of linearly.
 This means that the algorithm probes positions using a quadratic
sequence (e.g., 1^2, 2^2, 3^2, etc.) to search for the next available slot.
 Quadratic probing helps mitigate clustering by spreading out collided
items more evenly throughout the hash table, reducing the likelihood of
consecutive collisions leading to clustering.
7. Load factors calculation: a. Load factor = Number of items / Length of array =
10 / 30 = 1/3 b. Load factor = 30 / 30 = 1 c. Load factor = 100 / 30 ≈ 3.33
8. Explanation of chaining:
 Chaining is a collision resolution technique where each bucket (array slot)
in the hash table contains a linked list of items that hash to the same
index.
 When a collision occurs, the new item is inserted at the head of the linked
list corresponding to its home index.
 This allows multiple items with the same home index to be stored together
in the same bucket, ensuring efficient retrieval and storage.
 Chaining handles collisions gracefully by dynamically adjusting the size of
the linked lists as needed, making it a flexible and effective collision
resolution strategy.

These explanations should provide a comprehensive understanding of each concept. Let


me know if you need further clarification on any point!

You might also like