36

What does "bucket entries" mean in the context of a hashtable?

2

4 Answers 4

50

A bucket is simply a fast-access location (like an array index) that is the the result of the hash function.

The idea with hashing is to turn a complex input value into a different value which can be used to rapidly extract or store data.

Consider the following hash function for mapping people's names into street addresses.

First take the initials from the first and last name and turn them both into numeric values (0 through 25, from A through Z). Multiply the first by 26 and add the second, and this gives you a value from 0 to 675 (26 * 26 distinct values, or bucket IDs). This bucket ID is then to be used to store or retrieve the information.


Now you can have a perfect hash (where each allowable input value maps to a distinct bucket ID) so that a simple array will suffice for the buckets. In that case, you can just maintain an array of 676 street addresses and use the bucket ID to find the one you want:

+-------------------+
| George Washington | -> hash(GW)
+-------------------+      |
                           +-> GwBucket[George's address]
+-------------------+
|  Abraham Lincoln  | -> hash(AL)
+-------------------+      |
                           +-> AlBucket[Abe's address]

However, this means that George Wendt and Allan Langer are going to cause problems in the future.


Or you can have an imperfect hash (such as one where John Smith and Jane Seymour would end up with the same bucket ID).

In that case, you need a more complex backing data structure than a simple array, to maintain a collection of addresses. This could be as simple as a linked list, or as complex as yet another hash:

+------------+       +--------------+
| John Smith |       | Jane Seymour |
+------------+       +--------------+
      |                     |
      V                     V
   hash(JS)              hash(JS)
      |                     |
      +-----> JsBucket <----+
                 \/
+-----------------------------------+
| John Smith   ->  [John's address] |
| Jane Seymour ->  [Jane's address] |
+-----------------------------------+

Then, as well as the initial hash lookup, an extra level of searching needs to be carried out within the bucket itself, to find the specific information.

17

From Wikipedia:

hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (e.g., their telephone number). Thus, a hash table implements an associative array. The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be sought.

enter image description here

Each entry in the array/vector is called as an Bucket.

6

I think Bucket is a structure that at least contain hash value, which works as indexes, (hash values are generated by hash functions), but the structure itself might contain entries (data) or not.

illustration:

[hash value][points to actual data] ---> [actual data]
|<------------bucket structure------>|

[hash value][actual data]
|-----bucket structure--->|

It is the [hash value] part works as indexes.


I found these photos from hash_table Wikipedia are pretty straightforward.

The photos below indicates that entries (data) can be stored within buckets or it can be stored with its own data structure, while bucket simply points to the data.

enter image description here enter image description here enter image description here

0

Both rehashing and coalesced hashing assume fixed table sizes determined in advance. If the number of records grow beyond the number of table positions, it is impossible to insert them without allocating a larger table and recomputing hash.

Another method of resolving hash clashes is separate chaining. Term Bucket is generally used with separate chaining. Separate chaining involves keeping a distinct linked list for all records whose keys hash into a particular value.

Suppose that hash function produces values between 0 and tablesize - 1. Then an array bucket of header nodes of size tablesize is declared. This array is called the hash table.

Bucket[i], bucket entry, points to the list of all records who keys hash into i. To insert a record, the list head bucket[i] is accessed and record is inserted at the tail end.

Not the answer you're looking for? Browse other questions tagged or ask your own question.