45

I am looking at the source code for HashMap in Java 7, and I see that the put method will check if any entry is already present and if it is present then it will replace the old value with the new value.

    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

So, basically it means that there would always be only one entry for the given key, I have seen this by debugging as well, but if I am wrong then please correct me.

Now, since there is only one entry for a given key, why does the get method have a FOR loop, since it could have simply returned the value directly?

    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
            return e.value;
    }

I feel the above loop is unnecessary. Please help me understand if I am wrong.

9
  • 39
    One entry per key, yes, but not one entry per hash.
    – Bergi
    Commented Feb 26, 2018 at 12:34
  • 6
    The wikipedia article on hash tables, especially the part on separate chaining collision resolution has a good illustration about how the values are stored internally in a linked list. You need to iterate through the list to reach the correct value because the list (the bucket) is only accessible through the hash code. Commented Feb 26, 2018 at 12:59
  • 4
    Your analysis is incomplete: you overlooked that the put method also has a for loop. That should go some way towards answering your question. Commented Feb 26, 2018 at 13:58
  • 2
    @DanielPryden agreed. The best way is to try and implement one yourself - or a simpler implementation of the same as a learning tool. Sooner or later, you'll run into the exact same problem that the for loop fixes, and you'll know why.
    – Baldrickk
    Commented Feb 26, 2018 at 16:58
  • 1
    Possible duplicate of How does a hash table work? Commented Feb 27, 2018 at 11:22

7 Answers 7

62

table[indexFor(hash, table.length)] is the bucket of the HashMap that may contain the key we are looking for (if it is present in the Map).

However, each bucket may contain multiple entries (either different keys having the same hashCode(), or different keys with different hashCode() that still got mapped to the same bucket), so you must iterate over these entries until you find the key you are looking for.

Since the expected number of entries in each bucket should be very small, this loop is still executed in expected O(1) time.

10
  • My point is that in the put method, logic hashing and index logic of get is already done, and in the end old value is replaced with new value, if instead a new entry was added then all this makes sense, but since it doesn't so I am feeling that there is really no point of that FOR loop in get. I agree that loop will give O(1)
    – pjj
    Commented Feb 26, 2018 at 12:34
  • @pjj For a given key, there can be at most one entry in that bucket, but there may be entries with different keys in the same bucket. Therefore the loop is required until you find a key equal to the key you are looking for.
    – Eran
    Commented Feb 26, 2018 at 12:36
  • 7
    @pjj different keys can have the same hashcode value, thus end-up in the same bucket (buckets are chosen based on the hash of the key)
    – Eugene
    Commented Feb 26, 2018 at 12:42
  • 1
    @pjj basically the case you are talking about is hashcode collision with different keys thats set
    – Dhiraj
    Commented Feb 26, 2018 at 13:20
  • 8
    @pjj I never said there can be more than one entry for a given key. I said there can be more than one entry for a given bucket.
    – Eran
    Commented Feb 26, 2018 at 14:32
18

If you see the internal working of get method of HashMap.

public V get(Object key)  {
        if (key == null)
           return getForNullKey();
         int hash = hash(key.hashCode());
         for (Entry<K,V> e = table[indexFor(hash, table.length)];e != null;e = e.next) 
         {
             Object k;
             if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                 return e.value;
         }
             return null;
}
  • First, it gets the hash code of the key object, which is passed, and finds the bucket location.
  • If the correct bucket is found, it returns the value (e.value)
  • If no match is found, it returns null.

Some times there may be chances of Hashcode collision and for solving this collision Hashmap uses equals() and then store that element into LinkedList in same bucket.

Lets take example:enter image description here

Fetch the data for key vaibahv: map.get(new Key("vaibhav"));

Steps:

  1. Calculate hash code of Key {“vaibhav”}.It will be generated as 118.

  2. Calculate index by using index method it will be 6.

  3. Go to index 6 of array and compare first element’s key with given key. If both are equals then return the value, otherwise check for next element if it exists.

  4. In our case it is not found as first element and next of node object is not null.

  5. If next of node is null then return null.

  6. If next of node is not null traverse to the second element and repeat the process 3 until key is not found or next is not null.

For this retrieval process for loop will be used. For more reference you can refer this

7
  • 9
    Please don't use JPG for non-photographic images. Commented Feb 26, 2018 at 13:24
  • Can you give an example where there could be more than one entry for a given key?
    – pjj
    Commented Feb 26, 2018 at 14:31
  • 2
    @pjj Not the same key, but the same hash (or the same array index, even if not the same hash). The point of the answers here is that HashMap does not (cannot) magically contain a unique place to put every entry that will be put in the future. Keys which are put into the HashMap can have hash-conflicts, in which they want to be put in the same place in the structure even though they are not really the same key. There is a special way to resolve that when the entry is put into the map. On get, you have to account for that.
    – Loduwijk
    Commented Feb 26, 2018 at 17:22
  • @pjj consider the case where hascode collision with different keys occure
    – Dhiraj
    Commented Feb 27, 2018 at 4:59
  • @pjj in that case the second key-value pair will go to same bucket where already other key-value pair are present.But the case you are talking about will old entry replaced by new entry will occur when both keys are same.If keys are different and hashcode collision occur both entries will go to same bucket.That time hashmap will meintain link list to keep track of both entries
    – Dhiraj
    Commented Feb 27, 2018 at 5:01
5

For the record, in java-8, this is present also (sort of, since there are TreeNodes also):

if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }

Basically (for the case when the bin is not a Tree), iterate the entire bin, until you find the entry we are looking for.

Looking at this implementation you might understand why providing a good hash is good - so that not all entries end up in the same bucket, thus a bigger time to search for it.

4

I think @Eran has already answered your query well and @Prashant has also made a good attempt along with other people who have answered, so let me explain it using an example so that concept becomes very clear.

Concepts

Basically what @Eran is trying to say that in a given bucket (basically at given index of the array) it is possible that there is more than one entry (nothing but Entry object) and this is possible when 2 or more keys give different hash but give the same index/bucket location.

Now, in order to put the entry in the hashmap, this is what happens at a high level (read carefully because I have gone the extra mile to explain some good things which are otherwise not part of your question):

  • Get the hash: what happens here is that first hash is calculated for a given key (notice that this is not hashCode, a hash is calculated using the hashCode and it is done as-as to mitigate the risk of poorly written hash function).
  • Get the index: This is basically the index of the array or in other words bucket. Now, why this index is calculated instead of directly using the hash as the index is because to mitigate the risk that hash could more than the size of the hashmap, so this index calculation step ensures that index will always be less than the size of the hashmap.

And when a situation occurs when 2 keys give different hash but the same index, then both those will go in the same bucket, and that is the reason that FOR loop is important.

Example

Below is a simple example I have created to demonstrate the concept to you:

public class Person {
    private int id;

    Person(int _id){
        id = _id;
    }

    public int getId() {
        return id;
    }
    public void setId(int id) {
        this.id = id;
    }

    @Override
    public int hashCode() {
        return id;
    }
}

Test class:

import java.util.Map;

public class HashMapHashingTest {
    public static void main(String[] args) {
        Person p1 = new Person(129);
        Person p2 = new Person(133);

        Map<Person, String> hashMap = new MyHashMap<>(2);
        hashMap.put(p1, "p1");
        hashMap.put(p2, "p2");
        System.out.println(hashMap);
    }
}

Debug screenshot (please click and zoom because it is looking small):

enter image description here

Notice, that in above example, both Person object gives different hash value (136 and 140 respectively) but gives the same index of 0, so both objects go in the same bucket. In the screenshot, you can see that both objects are at index 0 and there you have a next also populated which basically points to the second object.


Update: Another easiest way to see that more than one key is going into the same bucket is by creating a class and overriding the hashCode method to always return the same int value, now what would happen is that all the objects of that class would give the same index/bucket location but since you have not overridden the equals method so they would not be considered same and hence will form a list at that index/bucket location.

Another twist in this would suppose you override the equals method as well and compare all objects equal then only one object will be present at the index/bucket location because all objects are equal.

2

While the other answers explain what is going on, OP's comments on those answers leads me to think a different angle of explanation is required.

Simplified example

Let's say you are going to toss 10 strings into a hash map: "A", "B", "C", "Hi", "Bye", "Yo", "Yo-yo", "Z", "1", "2"

You are using HashMap as your hash map instead of making your own hash map (good choice). Some of the stuff below will not use HashMap implementation directly but will approach it from a more theoretical and abstract point of view.

HashMap does not magically know that you are going to add 10 strings to it, nor does it know what strings you will be putting into it later. It has to provide places to put whatever you might give to it... for all it knows you are going to put 100,000 strings in it - perhaps every word in the dictionary.

Let's say that, because of the constructor argument you chose when making your new HashMap(n) that your hash map has 20 buckets. We'll call them bucket[0] through bucket[19].

  1. map.put("A", value); Let's say that the hash value for "A" is 5. The hash map can now do bucket[5] = new Entry("A", value);

  2. map.put("B", value); Assume hash("B") = 3. So, bucket[3] = new Entry("B", value);

  3. map.put("C"), value); - hash("C") = 19 - bucket[19] = new Entry("C", value);

  4. map.put("Hi", value); Now here's where it gets interesting. Let's say that your hash function is such that hash("Hi") = 3. So now hash map wants to do bucket[3] = new Entry("Hi", value); We have a problem! bucket[3] is where we put the key "B", and "Hi" is definitely a different key than "B"... but they have the same hash value. We have a collision!

Because of this possibility, the HashMap is not actually implemented this way. A hash map needs to have buckets that can hold more than 1 entry in them. NOTE: I did not say more than 1 entry with the same key, as we cannot have that, but it needs to have buckets that can hold more than 1 entry of different keys. We need a bucket that can hold both "B" and "Hi".

So let's not do bucket[n] = new Entry(key, value);, but instead let's have bucket be of type Bucket[] instead of Entry[]. So now we do bucket[n].add( new Entry(key, value) );

So let's change to...

bucket[3].add("B", value);

and

bucket[3].add("Hi", value);

As you can see, we now have the entries for "B" and "Hi" in the same bucket. Now, when we want to get them back out, we need to loop through everything in the bucket, for example, with a for loop.

So the looping is present because of the collisions. Not collisions of key, but collisions of hash(key).

Why do we use such a crazy data structure?

You might be asking at this point, "Wait, WHAT!?! Why would we do such a weird thing like that??? Why are we using such a contrived and convoluted data structure???" The answer to that question would be...

A hash map works like this because of the properties that such a peculiar setup provides to us due to the way the math works out. If you use a good hash function which minimizes conflicts, and if you size your HashMap to have more buckets than the number of entries that you guess will be in it, then you have an optimized hash map which will be the fastest data structure for inserts and queries of complex data.

Your HashMap may be too small

Since you say you are often seeing this for-loop being iterated over with multiple elements in your debugging, that means that your HashMap might be too small. If you have a reasonable guess as to how many things you might put into it, try to set the size to be larger than that. Notice in my example above that I was inserting 10 strings but had a hash map with 20 buckets. With a good hash function, this will yield very few collisions.

Note:

Note: the above example is a simplification of the matter and does take some shortcuts for brevity. A full explanation is even slightly more complicated, but everything you need to know to answer the question as asked is here.

1

Hash tables has buckets because hashes of objects do not have to be unique. If hashes of objects are equal, means, objects, probably, are equal. If hashes of objects are different, then objects are exactly different. Therefore, objects with the same hashes are grouped into buckets. The for loop is used to iterate objects contained in such a bucket.

In fact, this means that the algorithmic complexity of finding an object in such a hash table is not constant (although very close to it), but something between logarithmic and linear.

1
  • 1
    Hash tables have constant complexity, not logarithmic. See question and answer here.
    – anatolyg
    Commented Feb 26, 2018 at 15:14
0

I would like to put it in simple words. the put method have a FOR loop to iterate over the list of keys which falls under the same bucket of hashCode.

What happens when you do put the key-value pair into the hashmap:

  1. So for every key you pass to the HashMap, it will calculate the hashCode for it.
  2. So many keys can fall under the same hashCode bucket. Now HashMap will check if the same key is already present or not in the same bucket.
  3. In Java 7, HashMap maintains all the keys of the same bucket in a list. So before inserting the key it will traverse through the list to check if the same key is present or not. That's why there is a FOR loop.

So in average case its time complexity: O(1) and in worst case its time complexity is O(N).

Not the answer you're looking for? Browse other questions tagged or ask your own question.