Hash Table

Data structure which maps an inserted object through a hash function, producing a numeric hash code, which is then processed using a reduction function into a key used to index and store the object in a particular bin, a location in a dynamic backing array.

All objects in Java have a  method, which produces an integer representation of the object invoked on. The reduction function maps this representation to a range of indices corresponding to each bin. To do this evenly, this function is typically chosen to be modulo a prime, since this minimizes collisions (events where distinct objects map to the same bucket). In the separate chaining scheme, a data structure (typically a linked list for small bins and an LLRB otherwise) stores the contents of a bin. In open addressing schemes, collisions are handled with probing functions, which probe other indices in the hash table for an empty bin.

All objects in Java have a hashCode method, which produces an integer representation of the object invoked on. The reduction function maps this representation to a range of indices corresponding to each bin. To do this evenly, this function is typically chosen to be modulo a prime, since this minimizes collisions (events where distinct objects map to the same bucket). In the separate chaining scheme, a data structure (typically a linked list for small bins and an LLRB otherwise) stores the contents of a bin. In open addressing schemes, collisions are handled with probing functions, which probe other indices in the hash table for an empty bin.

The load factor of a hash table with $N$ elements and $M$ bins is defined as the ratio $\frac{N}{M}$, the average bin size. In Java, if the load factor is ever greater than $0.75$, the backing array is resized to have twice as many buckets and each object is rehashed. This ensures the average bin size is always constant in $N$ (since $M > N \rightarrow\mathcal{O}(\frac{N}{M}) \approx \mathcal{O}(1)$), meaning searching through bins is constant on average.

Given this, and assuming a uniform distribution of elements, indexing into a position in the backing array and traversing the stored data structure takes constant time on average, so the contains and insert operations have constant amortized time complexity.

image.png

Aside. Prime moduli are ideal for a modular reduction function because any multiple of a hash code is guaranteed to be distinct modulo a prime up to a multiple of the prime itself (a result from number theory which holds for any modulus coprime to that hash code). Formally, for any prime $p$, we have that $f_a(x) \equiv ax \space (\text{mod }p)$ is a bijection for $a,x \in \text{GF}(p)$, which implies that collisions occur less frequently since the entire table is being utilized at regular intervals.

However, it is uncommon to have a prime modulus for the reduction function since it is always equal to the number of bins, which is conventionally a power of 2. For this reason, a prime modulus is typically used before reduction, which helps with an even distribution of hash codes. Hence, an appreciable proportion of hash codes will be odd, thus coprime to the reduction modulus, and so the resulting distribution over the bins will be more uniform.


Priority Queue

Abstract data type for a queue where items are ordered by priority.

This course focuses on a min-priority queue, which has operations ,  and .

This course focuses on a min-priority queue, which has operations getSmallest, removeSmallest and add.

![Problem. Operations on an ordered array require shifting over all of the elements in the worst case. Balanced BSTs are nice, but do not support duplicate items easily. Hash tables do not store items in an ordered way, so we would be required to check every item to locate a minimum/maximum.

Solution. Heaps. Traversal is logarithmic, since a complete tree is a bushy tree, and the smallest element is always the label of the root.](attachment:8f76132e-e96a-4de9-a1a9-bd24922f45ab:image.png)

Problem. Operations on an ordered array require shifting over all of the elements in the worst case. Balanced BSTs are nice, but do not support duplicate items easily. Hash tables do not store items in an ordered way, so we would be required to check every item to locate a minimum/maximum.

Solution. Heaps. Traversal is logarithmic, since a complete tree is a bushy tree, and the smallest element is always the label of the root.

Heap

Data structure where every node $X$ is stored in a complete, binary tree with the property that all of its children have a key less than (max-heap) or greater than (min-heap) its key.

Completeness. A complete tree is a tree where all nodes are as far left as possible and all leaves are on the same level.

Completeness. A complete tree is a tree where all nodes are as far left as possible and all leaves are on the same level.

    *The following only considers min-heaps, however max-heaps are implemented very similarly.*