5
\$\begingroup\$

I implemented lazy_map - https://github.com/tinytrashbin/lazy_map.

Can someone help with reviews ?

lazy_map is an implementation of unordered_map that has O(1) cost of copying irrespective of the current size of map. The optimizations done for O(1) copy operation don't have any visiable side-effect on the map interface no matter how lazy_map is used. The value semantics of the map are preserved while copying. i.e. any write operation on the copied map are totally independent of copied-from map. For any two different lazy_map object, write operation on one have no impact on the other one. The map operations like insertion, deletion, loopup etc. continues to costs O(1) all the time except in a special case of detachment.

Client of lazy_map Must Know That

  1. The iterator can get invalidated on any standard write operation. (unlike std::unordered_map, which guarantees iterator stability on erase). lazy_map offers two non standard methods move_value and move to move out value of a key. lazy_map guarantees iterator stability on 'move_value' and 'move' operation.

  2. If the cost of copy for value-type is large, it's good idea to wrap the value type in a cow_wrapper. (eg: lazy_map<int, cow_wrapper<V>> because the internal implementation of lazy_map might have to copy the value a lot of time. The time complexity analysis here assumes that cost of copying key/value is O(1).

  3. Standard map methods, which expose mutable internal references, are NOT supported. eg: non-const operator[], non-const iterator etc.

  4. In addition to standard find method, contains method is supported that takes a key and return a boolean.

  5. Non standard method move_value and move takes a @key and return a unique_ptr by moving the value at @key. 'move_value' returns nullptr if the value is shared by other objects. 'move' copies the value if the value is shared by other objects. Both of these method return nullptr if the key doesn't exist. ToDo(Mohit): Revisit this point.

Implementation Overview:

The implementation of lazy_map stores the data of a map in a chain of fragments. Each fragment stores the modification to be applied on the map state. These fragments are shared across different map objects. If a map is copied, the new object becomes another share holder of the old fragment. (think std::shared_ptr). The map updates (insertion, deletion) on the copied object are stored on a new fragment on top of the current fragment. Fragments can have a parent fragment. These fragment create a tree like structure. (similar to git commits). The absolute value of a fragment is computed by applying the modification in current fragment on the absolute value of parent fragment.

If a fragment have a long chain of parents (i.e. length of chain > @max_depth), the absolute value of a fragment is computed and updated inplace and the fragment is detached from it's parent. This operation is called detachment. It's is the most expensive operation in lazy_map. Note: default value of @max_depth is 3.

The iteration on lazy_map consts O(number_of_keys * depth_of_parents_chain). In fact for most of the lazy_map APIs, a factor of @depth_of_parents_chain comes into time complexity. Which is not a big deal if we use a small @max_depth. Note that lazy_map is not optimized for deeply nested fragments. but it is highly optimized for fragment trees with very high breadth/branches. i.e. 1000s of copy of a very big map for making small modifications can save cost of copying with lazy_map.

Implementation Details:

shared fragments in lazy_map are both value-immutable as well as structure-immutable. Checkout messy_lazy_map.md as well, whose shared fragments are value-immutable but not structure-immutable.

A fragment can be detached only if it has exactly one owner. Hence lazy_map doesn't need to protect their fragments by a read-write lock.

Note that lazy_map don't track the depth of parents chain. They needs to be detached manually when required. Generally it's good idea to detach them manually if the fragment chain is going to be very large.

Fragment:

struct Fragment {
  std::shared_ptr<Fragment> parent;
  std::unordered_map<K, V> key_values;
  std::unordered_set<K> deleted_keys;
}

A fragment records the deleted keys as well as updated key-value pairs.

The absolute values of a fragment is computed as follows:

AbsoluteValue(fragment) {
  if (fragment.parent == nullptr) return fragment.key_values;
  else {
    let m = AbsoluteValue(*fragment.parent);
    m.erase_keys(fragment.deleted_keys);
    m.override_key_values(fragment.key_values);
    return m;
  }
}

Following invariants are guaranteed in every fragment.

  1. fragment.key_values and fragment.deleted_keys should be disjoint.

  2. if x ∈ fragment.deleted_keys then (fragment.parent must not be nullptr && x ∈ AbsoluteValue(fragment.parent))

eg:

F1 = (key_values={1: 10, 2: 20, 3:30}, deleted_keys={}, parent=nullptr)

F2 = (key_values={2: 30, 4: 40}, deleted_keys={1}, parent=F1)

F3 = (key_values={1: 10, 7: 70, 6:60}, deleted_keys={2, 4}, parent=F2)

AbsoluteValue(F1) = {1: 10, 2: 20, 3:30}

AbsoluteValue(F2) = {2: 30, 3:30, 4: 40}

AbsoluteValue(F3) = {1: 10, 3: 30, 6: 60, 7:70}

Detached(F3): = (key_values={1: 10, 3: 30, 6: 60, 7:70}, deleted_keys={}, parent=nullptr)

AbsoluteValue doesn't change by detachment.


lazy_map.cpp


// Author: Mohit Saini ([email protected])

// The documentation (ts/lazy_map.md) MUST be read twice
// before using this utility.

#ifndef TS_LAZY_MAP_HPP_
#define TS_LAZY_MAP_HPP_

#include <assert.h>

#include <memory>
#include <utility>
#include <vector>
#include <unordered_map>
#include <unordered_set>
#include <string>

namespace ts {
namespace lazy_map_impl {

constexpr const char* key_error = "[lazy_map]: Key not found";

template<typename C, typename K>
inline bool contains_key(const C& c, const K& k) {
  return (c.find(k) != c.end());
}

template<typename K, typename V>
class lazy_map {
  class const_iter_impl;
  using underlying_map = typename std::unordered_map<K, V>;
  using underlying_const_iter = typename underlying_map::const_iterator;

 public:
  using key_type = K;
  using mapped_type = V;
  using value_type = std::pair<const K, V>;
  using const_iterator = const_iter_impl;
  using iterator = const_iterator;
  using reference = value_type&;
  using const_reference = const value_type&;
  lazy_map() : node_(std::make_shared<Node>()) { }
  lazy_map(std::initializer_list<value_type> values)
    : node_(std::make_shared<Node>(values)) { }
  template<typename InputIt>
  lazy_map(InputIt first, InputIt last)
    : node_(std::make_shared<Node>(first, last)) { }

  bool detach() {
    prepare_for_edit();
    return detach_internal();
  }

  bool is_detached() const {
    return (node_->parent_ == nullptr);
  }

  bool contains(const K& k) const {
    return contains_internal(k);
  }

  size_t get_depth() const {
    size_t depth = 0;
    for (auto* p = &node_->parent_; (*p) != nullptr; p = &((*p)->parent_)) {
      depth++;
    }
    return depth;
  }

  const V& at(const K& k) const {
    for (auto* p = &node_; (*p) != nullptr; p = &((*p)->parent_)) {
      if (contains_key((*p)->values_, k)) {
        return (*p)->values_.at(k);
      }
      if (contains_key((*p)->deleted_keys_, k)) {
        throw std::runtime_error(key_error);
      }
    }
    throw std::runtime_error(key_error);
  }

  const V& operator[](const K& k) const {
    return at(k);
  }

  size_t size() const {
    return node_->size_;
  }

  bool empty() const {
    return (node_->size_ == 0);
  }

  void insert_or_assign(const K& k, const V& v) {
    prepare_for_edit();
    node_->size_ += contains_internal(k) ? 0: 1;
    node_->deleted_keys_.erase(k);
    if (contains_key(node_->values_, k)) {
      node_->values_.at(k) = v;
    } else {
      node_->values_.emplace(k, v);
    }
  }

  void insert_or_assign(const K& k, V&& v) {
    prepare_for_edit();
    node_->size_ += contains_internal(k) ? 0: 1;
    node_->deleted_keys_.erase(k);
    if (contains_key(node_->values_, k)) {
      node_->values_.at(k) = std::move(v);
    } else {
      node_->values_.emplace(k, std::move(v));
    }
  }

  void insert_or_assign(const value_type& kv) {
    insert_or_assign(kv.first, kv.second);
  }

  void insert_or_assign(value_type&& kv) {
    insert_or_assign(kv.first, std::move(kv.second));
  }

  // Non standard method. insert_or_assign is the standard name for it (C++17).
  inline void put(const K& k, const V& v) {
    insert_or_assign(k, v);
  }

  // Non standard method. insert_or_assign is the standard name for it (C++17).
  inline void put(const K& k, V&& v) {
    insert_or_assign(k, std::move(v));
  }

  bool insert(const K& k, const V& v) {
    if (contains_internal(k)) return false;
    prepare_for_edit();
    node_->deleted_keys_.erase(k);
    node_->values_.emplace(k, v);
    node_->size_++;
    return true;
  }

  bool insert(const K& k, V&& v) {
    if (contains_internal(k)) return false;
    prepare_for_edit();
    node_->deleted_keys_.erase(k);
    node_->values_.emplace(k, std::move(v));
    node_->size_++;
    return true;
  }

  bool insert(const value_type& kv) {
    return insert(kv.first, kv.second);
  }

  bool insert(value_type&& kv) {
    return insert(kv.first, std::move(kv.second));
  }

  template<typename... Args>
  bool emplace(const K& k, Args&&... args) {
    if (contains_internal(k)) return false;
    prepare_for_edit();
    node_->deleted_keys_.erase(k);
    node_->values_.emplace(std::piecewise_construct,
                           std::forward_as_tuple(k),
                           std::tuple<Args&&...>(std::forward<Args>(args)...));
    node_->size_++;
    return true;
  }

  void clear() {
    // No need to prepare_for_edit.
    node_ = std::make_shared<Node>();
  }

  bool erase(const K& k) {
    if (not contains_internal(k)) return false;
    prepare_for_edit();
    node_->values_.erase(k);
    if (contains_internal(k)) {
      node_->deleted_keys_.insert(k);
    }
    node_->size_--;
    return true;
  }

  // - Erase the given key and return the erased value. If the key-value pair
  //   is not shared by other objects, move the value in output. If the
  //   key-value pair is shared by other objects, copy the value for returning.
  // - It is useful when we need to update a value efficiently.
  // - Returns nullptr only if the key doesn't exists.
  // - Non-standard map method.
  std::unique_ptr<V> move_and_erase(const K& k) {
    if (not contains_internal(k)) return nullptr;
    prepare_for_edit();
    std::unique_ptr<V> output;
    if (contains_key(node_->values_, k)) {
      output = std::make_unique<V>(std::move(node_->values_[k]));
    } else {
      output = std::make_unique<V>(at(k));
    }
    node_->values_.erase(k);
    if (contains_internal(k)) {
      node_->deleted_keys_.insert(k);
    }
    node_->size_--;
    return output;
  }

  // - Move out the value of a key and return. Return nullptr if the key
  //   doesn't exists. If the value is shared by other objects, it will be
  //   copied.
  // - It is useful when we need to update the value efficiently.
  // - Equivalent of std::make_unique<V>(std::move(my_map[key])) in standard
  //   map.
  // - Since lazy_map don't expose mutable iternal reference,
  //   only way to update a value of key is: copy/move it in a variable, update
  //   it and then insert_or_assign it again.
  // - Return nullptr if either the key doesn't exists.
  // - Non-standard map method.
  std::unique_ptr<V> move(const K& k) {
    if (not contains_internal(k)) return nullptr;
    if (node_.unique() and contains_key(node_->values_, k)) {
      return std::make_unique<V>(std::move(node_->values_[k]));
    } else {
      return std::make_unique<V>(at(k));
    }
  }

  // Similar to move(k) method. In adding, it return the nullptr if the value
  // is shared instead of copying value.
  std::unique_ptr<V> move_only(const K& k) {
    if (not contains_internal(k)) return nullptr;
    if (node_.unique() and contains_key(node_->values_, k)) {
      return std::make_unique<V>(std::move(node_->values_[k]));
    }
    return nullptr;
  }

  const_iter_impl begin() const {
    return const_iter_impl(node_.get());
  }

  const_iter_impl end() const {
    return const_iter_impl(nullptr);
  }

  const_iterator find(const K& k) const {
    for (Node* p = node_.get(); p != nullptr; p = p->parent_.get()) {
      auto it = p->values_.find(k);
      if (it != p->values_.end()) {
        return const_iterator(node_.get(), p, std::move(it));
      }
      if (contains_key(p->deleted_keys_, k)) {
        return const_iterator();
      }
    }
    return const_iterator();
  }

 private:
  bool insert_internal(const K& k, const V& v) {
    if (contains_internal(k)) return false;
    node_->deleted_keys_.erase(k);
    node_->values_.emplace(k, v);
    node_->size_++;
    return true;
  }

  bool contains_internal(const K& k) const {
    for (Node* p = node_.get(); p != nullptr; p = p->parent_.get()) {
      if (contains_key(p->values_, k)) {
        return true;
      }
      if (contains_key(p->deleted_keys_, k)) {
        return false;
      }
    }
    return false;
  }

  void prepare_for_edit() {
    if (not node_.unique()) {
      auto new_node = std::make_shared<Node>(std::move(node_));
      node_ = std::move(new_node);
    }
  }

  bool detach_internal() {
    if (node_->parent_ == nullptr) return false;
    for (auto* p = &node_->parent_; (*p) != nullptr; p = &((*p)->parent_)) {
      for (auto& v : (*p)->values_) {
        if (not contains_key(node_->deleted_keys_, v.first)) {
          node_->values_.emplace(v.first, v.second);
        }
      }
      const auto& d = (*p)->deleted_keys_;
      node_->deleted_keys_.insert(d.begin(), d.end());
    }
    node_->deleted_keys_.clear();
    node_->parent_ = nullptr;
    return true;
  }

  struct Node {
    Node() = default;
    explicit Node(std::shared_ptr<Node>&& parent)
      : parent_(std::move(parent)), size_(parent_->size_) { }
    explicit Node(std::initializer_list<value_type> values)
      : values_(values), size_(values_.size()) { }
    explicit Node(const std::unordered_map<K, V>& other_map)
      : values_(other_map), size_(values_.size()) { }
    explicit Node(std::unordered_map<K, V>&& other_map)
      : values_(std::move(other_map)), size_(values_.size()) { }
    template<typename InputIt>
    Node(InputIt first, InputIt last)
      : values_(first, last), size_(values_.size()) { }
    std::shared_ptr<Node> parent_;
    std::unordered_map<K, V> values_;
    std::unordered_set<K> deleted_keys_;
    size_t size_ = 0;
  };
  // The implementation of this iterator relies on the C++ standard's sayings,
  // that comparison of two iterators from different container is undefined
  // behavior. Hence iterator of a lazy_map object should not
  // be compared with other lazy_map object.
  // In this implementation we are also ensuring that we don't compare iterator
  // of one unordered_map with another.
  class const_iter_impl {
   public:
    // Default constructed iterator is the end() iterator.
    const_iter_impl() = default;
    const_iter_impl(const Node* head,
                    const Node* current,
                    underlying_const_iter&& it)
      : head_(head), current_(current), it_(std::move(it)) {}

    const_iter_impl(const Node* head): head_(head), current_(head) {
      if (current_) {
        it_ = current_->values_.begin();
        if (not move_forward_to_closest_non_deleted_valid_position()) {
          current_ = nullptr;
        }
      }
    }
    bool operator==(const const_iter_impl& o) const {
      return (current_ == o.current_ && (current_ == nullptr || it_ == o.it_));
    }
    bool operator!=(const const_iter_impl& o) const {
      return not (*this == o);
    }
    // Precondition(@current_ != nullptr)
    const_iter_impl& operator++() {
      assert(current_ != nullptr);
      ++it_;
      if (not move_forward_to_closest_non_deleted_valid_position()) {
        current_ = nullptr;
        return *this;
      }
      return *this;
    }
    const_iter_impl& operator++(int) {
      auto old = *this;
      ++(*this);
      return old;
    }
    auto& operator*() const {
      return *it_;
    }
    auto* operator->() const {
      return it_.operator->();
    }
   private:
    // - Precondition(@current_ != nullptr)
    // - Postcondition(@current_ != nullptr)
    // - The closest non-deleted valid position might be at 0 distance apart.
    //    (if we are already there).
    // - Return false if reached end of stream.
    bool move_forward_to_closest_non_deleted_valid_position() {
      while(move_forward_to_closest_valid_position()) {
        if (should_ignore_key(it_->first)) {
          ++it_;
          continue;
        } else {
          return true;
        }
      }
      return false;
    }
    // - Precondition(@current_ != nullptr)
    // - Postcondition(@current_ != nullptr)
    // - The closest valid position might be at 0 distance apart. (if we are
    //   already on a valid position).
    // - Return false if we failed to move forward to a valid position.
    bool move_forward_to_closest_valid_position() {
      while (it_ == current_->values_.end()) {
        if (current_->parent_ == nullptr) {
          return false;
        }
        current_ = current_->parent_.get();
        it_ = current_->values_.begin();
      }
      return true;
    }
    // Precondition(@current_ != nullptr)
    bool should_ignore_key(const K& k) const {
      for (auto c = head_; c != current_; c = c->parent_.get()) {
        if (contains_key(c->values_, k)
             or contains_key(c->deleted_keys_, k)) {
          return true;
        }
      }
      return false;
    }
    // Invariant(head_ != nullptr || current_ == nullptr)
    const Node* head_ = nullptr;
    // current_ == nullptr means that this iterator is the `end()`
    const Node* current_ = nullptr;
    // Belongs to the containr current_->values_ if current_ is not nullptr.
    underlying_const_iter it_;
    friend class lazy_map;
  };
  friend class lazy_map_test_internals;
  std::shared_ptr<Node> node_;
};


Thanks !

\$\endgroup\$
2
  • \$\begingroup\$ It's not clear in the question, but does this satisfy the same complexity constrains as std::unordered_set for the other operations? In particular, are insert() and erase() no worse than linear in the size? \$\endgroup\$ Commented Mar 16, 2021 at 8:44
  • 1
    \$\begingroup\$ Actually amortized complexity is still O(1) (assuming depth is bounded by a small constant max_depth ~ 3) .. but in the worst case it can take more than O(size)... but this worst case is extremely rare to happen if we have a good hash function in std::unordered_map. \$\endgroup\$ Commented Mar 16, 2021 at 9:20

1 Answer 1

1
\$\begingroup\$
// The documentation (ts/lazy_map.md) MUST be read twice
// before using this utility.

Scary warning! I wonder whether that will be obeyed? It might be an idea to specifically mention pitfalls where we differ from standard unordered map.

#include <assert.h>

Prefer to use C++ library headers (i.e. <cassert>) rather than the deprecated C-compatibility headers.

  using value_type = std::pair<const K, V>;

Consider declaring this as = typename underlying_map::value_type (which will be the same type, but a different way of thinking of it). I'm not sure which of the two is a better choice here.

The operator[] is surprising in two ways: first, it performs all the same checks as at(), unlike the standard collections. Secondly, there's no non-const version to create a new key. Now I understand the insistence on reading the documentation file!

 size_t size() const {

Typo: std::size_t

In insert_or_assign(), there's an inefficiency:

if (contains_key(node_->values_, k)) {
  node_->values_.at(k) = std::move(v);

contains_key() finds the iterator for the key, but we don't retain that - it just returns a bool, so we end up having to find it again in at().


(partial review - Real Life just called)

\$\endgroup\$
0

Not the answer you're looking for? Browse other questions tagged or ask your own question.