Overview
This chapter presents hash tables, one of the most practical and widely used data structures for implementing dictionaries (supporting INSERT, SEARCH, and DELETE). While worst-case search time can degrade to Θ(n), hash tables deliver O(1) average-case performance under reasonable assumptions. The chapter progresses from direct-address tables through chaining and open addressing, covers the design of hash functions (division, multiplication, and universal hashing), and concludes with perfect hashing for static key sets.
Key Concepts
- Direct-address tables: When the universe of keys U = {0, 1, ..., m−1} is small, store elements directly in an array
T[0..m−1] indexed by key. All dictionary operations run in O(1) worst-case time. Impractical when |U| is large.
- Hash tables: Map keys from a large universe U into a table of size m using a hash function
h: U → {0, 1, ..., m−1}. The table size m is typically proportional to the number of keys actually stored, not the universe size. Element with key k is stored at slot h(k).
- Collisions: When two distinct keys k₁ ≠ k₂ satisfy
h(k₁) = h(k₂). Collisions are unavoidable when |U| > m (pigeonhole principle).
- Load factor: α = n/m, where n is the number of stored elements and m is the table size. A critical parameter governing performance.
- Simple uniform hashing assumption: Each key is equally likely to hash to any of the m slots, independently of other keys.
- Chaining: Each slot
T[j] holds a linked list of all elements that hash to slot j. Supports all dictionary operations.
- Open addressing: All elements reside within the hash table array itself (no external lists). The hash function produces a probe sequence
⟨h(k,0), h(k,1), ..., h(k,m−1)⟩ that is a permutation of ⟨0, 1, ..., m−1⟩. Load factor α ≤ 1.
- Universal hashing: A family H of hash functions such that for any two distinct keys k, l, at most |H|/m functions in H cause k and l to collide. Choosing h randomly from H defeats adversarial key distributions.
- Perfect hashing: A two-level hashing scheme for static key sets that guarantees O(1) worst-case search time using O(n) total space.
Algorithms and Techniques
Chaining
- CHAINED-HASH-INSERT(T, x): Insert x at the head of list
T[h(x.key)]. Runs in O(1).
- CHAINED-HASH-SEARCH(T, k): Search for key k in list
T[h(k)]. Worst case O(n) but Θ(1 + α) on average.
- CHAINED-HASH-DELETE(T, x): Delete element x from its list. O(1) with doubly linked lists (given a pointer to x).
Open Addressing
- HASH-INSERT(T, k): Probe slots in order
h(k,0), h(k,1), ... until an empty slot is found; store k there. Signals overflow if table is full.
- HASH-SEARCH(T, k): Probe the same sequence used for insertion. Returns the slot index if k is found, or NIL if an empty slot is encountered.