The documentation on C++'s unordered_map::erase
states:
Removes specified elements from the container. The order of the remaining elements is preserved. (This makes it possible to erase individual elements while iterating through the container.)
This is basically what I would expect from your average hash map with closed hashing: Entries aren't truly "deleted", space is not reclaimed; entries are merely marked as deleted, and we have to assume that their contents may now be garbage.
Given this, I would have assumed the following to be supported:
std::unordered_map<int, std::string> m = {
{-1, "one"}, {-2, "two"}, {-3, "three"},
{-4, "four"}, {-5, "five"}, {-6, "six"}
};
for (auto it = m.begin(); it != m.end(); ++it)
if (it.first % 2 == 0)
m.erase(it);
but it isn't. Contrast this with Lua, where the following works just fine:
-- negative keys to ensure that these don't end up in the "list" part internally
local m = {
[-1] = "one", [-2] = "two", [-3] = "three",
[-4] = "four", [-5] = "five", [-6] = "six",
}
-- next to be explicit (this is what pairs uses under the hood)
for k, v in next, m do
if k % 2 == 0 then
m[k] = nil
end
end
that is, deleting a key k
still lets you call next(m, k)
to get the next key.
Now of course it is properly documented that calling unordered_map::erase
invalidates the iterator:
References and iterators to the erased elements are invalidated. Other iterators and references are not invalidated.
My question is: Why does C++ not support advancing an unordered map iterator after erasing it, as Lua for example does?
Is this for simplicity of specification, so that there only needs to be one simple notion of "iterator invalidation", which can be applied here as well?
Edit: Perhaps for consistency with other containers?
Or are there some tangible (for example performance) benefits to not supporting this? Are there concrete examples of these benefits being leveraged in present day C++ stdlib implementations?
2nd Edit: I am aware that you can still iterate an unordered map while deleting elements by using the return value of erase
as outlined in the docs. I am asking why this clunkier and more error prone interface is preferred over simply allowing to use erased entries for advancing iterators.
3rd Edit:
"An iterator is a reference to an element of the container" is oversimplified: There is the notion of a "non-dereferencable" iterator (unordered_map::end
is non-dereferencable, for example).
You can test the above code (which just unconditionally advances the iterator) on GCC (11.4.0) and Clang (14.0.0), with different optimization levels, with and without asan and ubsan, and it seems to work as expected. (Edit: It seems to only work sometimes?)
I would be interested to know whether there is a C++ hash table implementation that takes advantage of this / how current implementations take advantage of this - that is, is there a (C++) hash table where ++it
for an iterator "invalidated" by erase
is not the same as the iterator returned by erase
, and what is the benefit of it?
It is still not quite clear to me why C++ would opt to have the standard deem the iterator invalidated rather than non-dereferencable.
It seems most plausible to me that this is indeed about consistency with other containers - for vectors, lists, and (sorted) maps, returning a new iterator is important - rather than about any technical benefit for the hash map implementation.
for k, v in next, m do print(k, v) end
afterwards, it will print only the odd keys & associated values. $\endgroup$