3
$\begingroup$

The documentation on C++'s unordered_map::erase states:

Removes specified elements from the container. The order of the remaining elements is preserved. (This makes it possible to erase individual elements while iterating through the container.)

This is basically what I would expect from your average hash map with closed hashing: Entries aren't truly "deleted", space is not reclaimed; entries are merely marked as deleted, and we have to assume that their contents may now be garbage.

Given this, I would have assumed the following to be supported:

std::unordered_map<int, std::string> m = {
    {-1, "one"}, {-2, "two"}, {-3, "three"},
    {-4, "four"}, {-5, "five"}, {-6, "six"}
};
for (auto it = m.begin(); it != m.end(); ++it)
    if (it.first % 2 == 0)
        m.erase(it);

but it isn't. Contrast this with Lua, where the following works just fine:

-- negative keys to ensure that these don't end up in the "list" part internally
local m = {
    [-1] = "one", [-2] = "two", [-3] = "three",
    [-4] = "four", [-5] = "five", [-6] = "six",
}
-- next to be explicit (this is what pairs uses under the hood)
for k, v in next, m do
    if k % 2 == 0 then
        m[k] = nil
    end
end

that is, deleting a key k still lets you call next(m, k) to get the next key.

Now of course it is properly documented that calling unordered_map::erase invalidates the iterator:

References and iterators to the erased elements are invalidated. Other iterators and references are not invalidated.

My question is: Why does C++ not support advancing an unordered map iterator after erasing it, as Lua for example does?

Is this for simplicity of specification, so that there only needs to be one simple notion of "iterator invalidation", which can be applied here as well?

Edit: Perhaps for consistency with other containers?

Or are there some tangible (for example performance) benefits to not supporting this? Are there concrete examples of these benefits being leveraged in present day C++ stdlib implementations?


2nd Edit: I am aware that you can still iterate an unordered map while deleting elements by using the return value of erase as outlined in the docs. I am asking why this clunkier and more error prone interface is preferred over simply allowing to use erased entries for advancing iterators.


3rd Edit:

"An iterator is a reference to an element of the container" is oversimplified: There is the notion of a "non-dereferencable" iterator (unordered_map::end is non-dereferencable, for example).

You can test the above code (which just unconditionally advances the iterator) on GCC (11.4.0) and Clang (14.0.0), with different optimization levels, with and without asan and ubsan, and it seems to work as expected. (Edit: It seems to only work sometimes?)

I would be interested to know whether there is a C++ hash table implementation that takes advantage of this / how current implementations take advantage of this - that is, is there a (C++) hash table where ++it for an iterator "invalidated" by erase is not the same as the iterator returned by erase, and what is the benefit of it?

It is still not quite clear to me why C++ would opt to have the standard deem the iterator invalidated rather than non-dereferencable.

It seems most plausible to me that this is indeed about consistency with other containers - for vectors, lists, and (sorted) maps, returning a new iterator is important - rather than about any technical benefit for the hash map implementation.

$\endgroup$
2
  • $\begingroup$ @EldritchConundrum from the perspective of the user, the entries are indistinguishable from deleted entries, except that the keys can still be used for iteration. If you do for k, v in next, m do print(k, v) end afterwards, it will print only the odd keys & associated values. $\endgroup$
    – Luatic
    Commented Mar 6 at 14:05
  • $\begingroup$ An iterator is a reference to an element of the container. When an element is erased, any reference to it is "dangling" and trying to do anything with it is nonsensical. While in theory you could have an interator-like thing that can either refer to an element of the container, or a special marker for a removed element, this would add a bunch of extra overhead to the iterator and container class. $\endgroup$
    – Chris Dodd
    Commented Mar 7 at 2:27

3 Answers 3

5
$\begingroup$

Check the return value of std::unordered_map::erase. It returns an iterator. In fact it is the iterator pointing to the element after the removed element.

what you should do is:

for (auto it = m.begin(); it != m.end(); )
    if (it.first % 2 == 0)
        it = m.erase(it);
    else
        ++it;

So it does support continuing iteration after an erase, but the way to do it is rather clunky.

C++ iterators are based on a pointer going through an array, an std::vector is just a pointer. The associated notation (operator *() and operator++()) got coopted into other STL containers.

To erase an element based on a pointer in the array you can keep the pointer as is, which now points to the next element. However for iterators in a node based container you need another way of getting the new iterator, so in the STL they return it.

If the iterator was passed by reference and the loop structure remains the same then it would also need to account for the increment at the end of the loop. But that would be different from the pointer case it was based on.

Other languages tend to have more bookkeeping in their iterators, for one to be able to have an integrated end-of-iteration test. Java for example has a remove integrated with the iterator interface, and the iterator is responsible for keeping its own state consistent so the rest of the loop can remain unchanged.

$\endgroup$
3
  • $\begingroup$ Yes, I am aware of this (this is the example from the docs), but I wonder why it is designed to be like this. Why can't ++it "just work" for a semi-invalidated iterator? When and why does erase return something else? Or in other words: What is the benefit of having this be clunky (and easy to mess up)? $\endgroup$
    – Luatic
    Commented Mar 6 at 13:51
  • $\begingroup$ So you'd say it is historically that way for consistency with std::vector, where erase does something different and the resulting iterator should not be advanced to get the next element? $\endgroup$
    – Luatic
    Commented Mar 6 at 15:57
  • 3
    $\begingroup$ @Luatic Another way to think of it is like linked lists. After you remove a node (which frees its memory), you can't use its node.next pointer. So the removal operation has to return something you can use .next on to go to the next node. $\endgroup$
    – Barmar
    Commented Mar 6 at 21:07
1
$\begingroup$

The C++ STL takes the approach of trying to minimize how many implementation details it specifies. Permitting a "semi-invalidated" iterator would require all implementations to keep enough information around to implement operator++. The developers of Lua balance convenient syntax's more heavily than they do avoiding restrictions on implementations. C++ strikes a different balance.

Also remember that C++ is not a garbage collected language. Lots of things that are super-easy with garbage collection behind you become tricky without it.

$\endgroup$
1
$\begingroup$

This is basically what I would expect from your average hash map with closed hashing: Entries aren't truly "deleted", space is not reclaimed; entries are merely marked as deleted, and we have to assume that their contents may now be garbage.

Your assumption is faulty.

std::unordered_map uses closed addressing (not closed hashing). The hash-map is typically an array of buckets, and the bucket typically a linked-list, empty to start with.

Each element is placed in a separately allocated node which is linked in the linked-list. In fact, this node-based implementation detail leaks into the API: check the std::unordered_map::extract function!

When the element is erased from the hash-map, its node is freed.

The technique of using a "deleted" marker is instead used in open hashing.

Why does C++'s unordered_map::erase fully invalidate iterators, not even supporting advancing?

Simply put, because the iterator is just a pointer to the (internal) node of the hash-map, and therefore after erasing the element, since the node is deleted, the iterator points into the nether.

It can't "advance", because that would require reading the next pointer from the freed node.

$\endgroup$
4
  • $\begingroup$ Closed hashing does not use explicit linked lists. Buckets are implicitly determined by probing (see Wikipedia). Closed hashing can very well use a "deleted" marker ("another technique for removal is simply to mark the slot as deleted"). Might you be confusing addressing and hashing here? The terminology is reversed (open addressing is closed hashing). $\endgroup$
    – Luatic
    Commented Mar 13 at 12:28
  • $\begingroup$ @Luatic: I may be indeed. I was thinking of addressing. Then your confusion is different. std::unordered_map uses closed addressing. $\endgroup$ Commented Mar 13 at 12:48
  • $\begingroup$ Ah thanks, I see! I'm used to hash maps using closed hashing (for example Lua uses closed hashing, to my knowledge). Do you know why C++ prefers open hashing? $\endgroup$
    – Luatic
    Commented Mar 13 at 13:51
  • $\begingroup$ @Luatic: The goal of the standard committee was to make std::unordered_map as close to possible in terms of interface and guarantees as std::map, so that people could upgrade as easily as possible. std::map has the "memory stability" guarantee: an element, once inserted, will have the same address no matter what operation is performed on the map until the element is erased from the map. Not having this guarantee would have been a footgun on the upgrade path, and therefore it required a node-based design for std::unordered_map. $\endgroup$ Commented Mar 13 at 13:56

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .