2

I understand that some hash tables use "buckets", which is a linked list of "entries".

HashTable
  -size    //total possible buckets to use
  -count   // total buckets in use
  -buckets //linked list of entries

Entry
  -key   //key identifier
  -value // the object you are storing for reference
  -next  //the next entry

In order to get the bucket by index, you have to call:

myBucket = someHashTable[hashIntValue]

Then, you could iterate the linked list of entries until you find the one you are looking for or null.

Does the hash function always return a NUMBER % HashTable.size? That way, you stay within the limit? Is that how the hash function should work?

4 Answers 4

10

Mathematically speaking, a hash function is usually defined as a mapping from the universe of elements you want to store in the hash table to the range {0, 1, 2, .., numBuckets - 1}. This means that in theory, there's no requirement whatsoever that you use the mod operator to map some integer hash code into the range of valid bucket indices.

However, in practice, almost universally programmers will use a generic hash code that produces a uniformly-distributed integer value and then mod it down so that it fits in the range of the buckets. This allows hash codes to be developed independently of the number of buckets used in the hash table.

EDIT: Your description of a hash table is called a chained hash table and uses a technique called closed addressing. There are many other implementations of hash tables besides the one you've described. If you're curious - and I hope you are! :-) - you might want to check out the Wikipedia page on the subject.

1

what is hash table?

It is also known as hash map is a data structure used to implement an associative array.It is a structure that can map keys to values.

How it works?

A hash table uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found.

See the below diagram it clearly explains.

enter image description here

Advantages:

In a well-dimensioned hash table, the average cost for each lookup is independent of the number of elements stored in the table.

Many hash table designs also allow arbitrary insertions and deletions of key-value pairs.

In many situations, hash tables turn out to be more efficient than search trees or any other table lookup structure.

Disadvantages:

The hash tables are not effective when the number of entries is very small. (However, in some cases the high cost of computing the hash function can be mitigated by saving the hash value together with the key.)

Uses:

They are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches and sets.

0
0

There is no predefined rule for how a hash function should behave. You can have all of your values map to index 0 - a perfectly valid hash function (performs poorly, but works).

Of course, if your hash function returns a value outside of the range of indices in your associated array, it won't work correctly. Thats not to say however, that you need to use the formula (number % TABLE_SIZE)

1
  • Can the person who down voted this answer provide an explanation? I'm going to guess they won't. Commented Jan 12, 2011 at 1:56
0

No, the table is typically an array of entries. You don't iterate it until you found the same hash, you use the hash result (or usually hash modulo numBuckets) to directly index into the array of entries. That gives you the O(1) behaviour (iterating would be O(n)).

When you try to store two different objects with the same hash result (called a 'hash collision'), you have to find some way to make space. Different implementations vary in how they handle collisions. You can create a linked list of all the objects with same hash, or use some rehashing to store in a different entry of the table.

Not the answer you're looking for? Browse other questions tagged or ask your own question.