Hash tables store objects in buckets using a hash function to map objects to integers corresponding to buckets. To solve the two-sum problem of finding all pairs of integers that sum to a target value t in an array A of length n, a hash table can be used to store each element x and check for t - x in O(n) time, faster than the O(n log n) time needed to sort and binary search. Pathological data sets that isolate few buckets can reduce hash table efficiency to linear time, but cryptographic hashes and randomizing the hash function choice can prevent this exploit.
Hash tables store objects in buckets using a hash function to map objects to integers corresponding to buckets. To solve the two-sum problem of finding all pairs of integers that sum to a target value t in an array A of length n, a hash table can be used to store each element x and check for t - x in O(n) time, faster than the O(n log n) time needed to sort and binary search. Pathological data sets that isolate few buckets can reduce hash table efficiency to linear time, but cryptographic hashes and randomizing the hash function choice can prevent this exploit.
Hash tables store objects in buckets using a hash function to map objects to integers corresponding to buckets. To solve the two-sum problem of finding all pairs of integers that sum to a target value t in an array A of length n, a hash table can be used to store each element x and check for t - x in O(n) time, faster than the O(n log n) time needed to sort and binary search. Pathological data sets that isolate few buckets can reduce hash table efficiency to linear time, but cryptographic hashes and randomizing the hash function choice can prevent this exploit.
Hash tables store objects in buckets using a hash function to map objects to integers corresponding to buckets. To solve the two-sum problem of finding all pairs of integers that sum to a target value t in an array A of length n, a hash table can be used to store each element x and check for t - x in O(n) time, faster than the O(n log n) time needed to sort and binary search. Pathological data sets that isolate few buckets can reduce hash table efficiency to linear time, but cryptographic hashes and randomizing the hash function choice can prevent this exploit.
Given a universe U of objects, hash tables maintain an evolving set S U
by keeping n buckets, where n |S|. To map objects to buckets, they use a hash function H : U 0, 1, 2, . . . , n 1 and an array of length n, storing an object x in A[h(x)].
1.1
Two-Sum Problem
Input: Array of integers A[1\ldots n]
Output: All pairs (a, b) such that a + b = t. Reducing the problem to sorting yields an O(n log n) solution - sort the array, then binary search for t x for each x. Using a hash table yields an O(n) solution.
Hashing and Pathological Data Sets
The load of a hash table is defined as
=
n |S|
(1)
= O(1) is a necessary condition for constant-time hash table operations,
and << 1 is necessary for open addressing when using a sequential hash function to deal with collisions. For good performance and efficiency with linked-list buckets, keep close to 1. A pathological data set is a data set that isolates one or very few buckets in a hash table to reduce it to linear efficiency, and exists for every hash function. Pathological data sets can paralyze real-world systems, and there are usually two ways to prevent this exploit: 1. Crytographic hashes are much more immune (practically completely) to pathological data sets (SHA-2, MD5, etc) 2. Hash randomization: pick a single hash function h in a family H of hash functions at runtime.