When to use hash tables?

Question

What are the cases when using hash table can improve performance, and when it does not? and what are the cases when using hash tables are not applicable?

That topic is way too broad to cover here. Start with en.wikipedia.org/wiki/Hash_table#Uses, and if you have specific questions after that, then post a new question. — Jim Mischel, Commented Mar 24, 2016 at 4:37

Tony Delroy · Accepted Answer · 2016-03-25 05:23:09Z

What are the cases when using hash table can improve performance, and when it does not?

If you have reason to care, implement using hash tables and whatever else you're considering, put your actual data through, and measure which performs better.

That said, if the hash tables has the operations you need (i.e. you're not expecting to iterate it in sorted order, or compare it quickly to another hash table), and has millions or more (billions, trillions...) of elements, then it'll probably be your best choice, but a lot depends on the hash table implementation (especially the choice of closed vs. open hashing), object size, hash function quality and calculation cost / runtime), comparison cost, oddities of your computers memory performance at different cache levels... in short: too many things to make even an educated guess a better choice than measuring, when it matters.

and what are the cases when using hash tables are not applicable?

Mainly when:

The input can't be hashed (e.g. you're given binary blobs and don't know which bits in there are significant, but you do have an int cmp(const T&, const T&) function you could use for a std::map), or
the available/possible hash functions are very collision prone, or
you want to avoid worst-case performance hits for:
- handling lots of hash-colliding elements (perhaps "engineered" by someone trying to crash or slow down your software)
- resizing the hash table: unless presized to be large enough (which can be wasteful and slow when excessive memory's used), the majority of implementations will outgrow the arrays they're using for the hash table every now and then, then allocate a bigger array and copy content across: this can make the specific insertions that cause this rehashing to be much slower than the normal O(1) behaviour, even though the average is still O(1); if you need more consistent behaviour in all cases, something like a balance binary tree may serve
your access patterns are quite specialised (e.g. frequently operating on elements with keys that are "nearby" in some specific sort order), such that cache efficiency is better for other storage models that keep them nearby in memory (e.g. bucket sorted elements), even if you're not exactly relying on the sort order for e.g. iteration

Thanks a lot for your answer, Tony :)
– Sullivan Risk
Commented Mar 25, 2016 at 13:50 — Sullivan Risk, Commented Mar 25, 2016 at 13:50

feltspar · Accepted Answer · 2022-02-15 19:26:04Z

3

We use Hash Tables to get access time of O(1). Imagine a dictionary. When you are looking for a word, eg "happy", you jump straight to 'H'. Here the hash function is determined by the starting alphabet. And then you look for happy within the H bucket (actually H bucket then HA bucket then HAP bucket anbd so on).

It doesn't make sense to use Hash Tables when your data is ordered or needs ordering like sorted numbers. (Alphabets are ordered ABCD....XYZ but it wouldn't matter if you switched A and Z, provided you know it is switched in your dictionary.)

edited Feb 15, 2022 at 19:26

answered Mar 25, 2016 at 4:53

feltspar

3231 gold badge4 silver badges17 bronze badges

Thanks a lot for your answer, feltspar
– Sullivan Risk
Commented Mar 25, 2016 at 13:51

Add a comment |

Collectives™ on Stack Overflow

When to use hash tables?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
data-structures
hash
hashtable
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged data-structureshashhashtable or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
data-structures
hash
hashtable
or ask your own question.