Both TreeMap and HashMap inherit from AbstractMap, but it should be noted that TreeMap also implements the NavigableMap interface and the SortedMap interface.
Implementing the NavigableMap interface gives TreeMap the ability to search elements within the collection.
The NavigableMap interface provides rich methods for exploring and operating on key-value pairs:
Implementing the SortedMap interface gives TreeMap the ability to sort the elements in the collection by key. By default, it sorts keys in ascending order, but we can also specify a comparator for sorting.
In summary, compared with HashMap, TreeMap mainly adds the ability to sort elements in the collection by key and the ability to search elements within the collection.
Before JDK 1.8
At this time, the underlying implementation of HashMap used a combination of array and linked list, that is, separate chaining. HashMap obtains a hash value after processing the key’s hashcode through a disturbance function, and then uses (n - 1) & hash to determine the position where the current element should be stored (where n refers to the length of the array). If an element already exists at the current position, it determines whether the hash value and key of that element are the same as those of the element to be inserted. If they are the same, it directly overwrites the existing element; if not, it resolves the conflict using the separate chaining method.
The disturbance function (hash method) in HashMap is used to optimize the distribution of hash values. By performing additional processing on the original hashCode(), the disturbance function can reduce collisions caused by poor hashCode() implementations, thereby improving the uniformity of data distribution.
After JDK 1.8
There was a major change in resolving hash collisions: when the length of a linked list is greater than the threshold, the linked list is converted into a red-black tree.
The purpose of this is to reduce search time: the query efficiency of a linked list is O(n) (where n is the length of the linked list), while a red-black tree is a self-balancing binary search tree with query efficiency of O(log n). When the linked list is short, the performance difference between O(n) and O(log n) is not obvious. However, when the linked list becomes longer, query performance will decline significantly.
To make HashMap access efficient and reduce collisions, we need to ensure that data is distributed as evenly as possible. Hash values in Java are usually represented by int, whose range is -2147483648 ~ 2147483647, giving roughly 4 billion possible mappings in total. As long as the hash function maps values relatively evenly and sparsely, collisions are generally unlikely in typical applications. However, the problem is that an array with a length of 4 billion cannot fit in memory. Therefore, this hash value cannot be used directly. Before using it, we also need to perform a modulo operation based on the array length, and the resulting remainder can then be used as the storage position, that is, the corresponding array index.
How should this algorithm be designed?
We may first think of using the % modulo operation to implement it. But here comes the key point: “If the divisor in the modulo (%) operation is a power of 2, it is equivalent to the bitwise AND (&) operation with the divisor minus one (that is, hash%length==hash&(length-1) holds on the premise that length is 2 to the power of n).” In addition, using the binary bit operation & can improve computational efficiency compared with %.
Besides the fact that bit operations are more efficient than modulo operations, I think a more important reason is that when the length is a power of 2, HashMap can distribute elements more evenly during expansion. For example:
At this time, when the original elements in HashMap calculate their new array positions using hash&(length-1), the result depends on the fourth binary bit of the hash value (counting from the right), and two situations may occur:
Finally, here is a simple summary of why the length of HashMap is a power of 2: