3
\$\begingroup\$

This is a Trie data structure for mapping strings to integers or pointers in O(n) time. Based on the idea that we never delete anything from the container, we can perform concurrent read/write operations without any locks.

Note that TCMap is initialized separately, its role is to map relevant characters to indices. (We only need a subset of the ASCII table; letters, numbers, and a few symbols.)

static const int TrieNChild = 40;

class TrieCharMap {
    int m[256];
public:
    TrieCharMap();
    TrieCharMap(const TrieCharMap& o) = delete;
    void operator = (const TrieCharMap& o) = delete;

    inline int operator [] (char x) { return m[(unsigned char)(x)]; }
};

extern TrieCharMap TCMap;

template <class T>
class Trie {
    struct node {
        std::atomic<node*> p[TrieNChild] {};
        std::atomic<T> z{};
        void clear() {
            for (int i = 0; i < TrieNChild; i++)
                if (p[i]) ((node*)(p[i]))->clear();
        }
    };

    node root;
    std::mutex mtx;

public:
    Trie() {  }
    ~Trie() { root.clear(); }

private:
    Trie(const Trie& o) {}
    void operator = (const Trie& o) {}

public:
    T add(const char* s, T value) { //return old value 
        node* x = &root;
        for (; *s; s++) {
            int i = TCMap[*s];
                
            node* next = x->p[i];
            if (!next) {
                next = new node;
                node* val = 0;
                if (!x->p[i].compare_exchange_strong(val, next, std::memory_order_relaxed, std::memory_order_relaxed)) {
                    //someone else wrote the pointer, just use that
                    delete next;
                    next = val;
                }
            }
            x = next;
        }
        return x->z.exchange(value, std::memory_order_relaxed); 
    }

    T search(const char* s) {
        node* x = &root;
        for (; x && *s; s++) 
            x = x->p[TCMap[*s]];
        if (x) return x->z;
        return T{};
    }   
};

\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

Make it more generic

Your Trie class is templated on the type of values stored in the trie, but it's hardcoded what the arity is. Now you can only use it for tries in which each node has exactly 40 children and for which the mapping between chars and indices is defined by TCMap. Consider making the arity a template parameter, and the mapping function a member variable:

template <class T, std::size_t K>
class Trie {
    struct node {
        std::atomic<node*> p[K] {};
        ...
    }

    ...
    std::function<std::size_t(char)> charmap;

public:
    Trie(std::function<std::size_t(char)> charmap): charmap(charmap) {}
    ...

    T search(const char* s) {
        ...
        x = x->p[charmap(*s)];
        ...
    }
};

The std::function looks heavy-weight, but the compiler might be able to optimize it away if it can see exactly what function is going to be called. Alternatively, you could just pass a reference to a std::array<int, 256> and move the mapping function into Trie itself.

Consider using std::string_view if possible

You tagged the post C++11, and std::string_view was only introduced in C++17, but if you can use it then it would be the preferred way of passing strings to add() and search(), as it will accept both const char * and std::strings then, and will allow you to use range-for loops:

T search(std::string_view s) {
    ...
    for (auto c: s)
        x = x->p[TCMap[c]];
    ...
}   

If you want to stay compatible with C++11, it might be worth adding overloads that accept const std::string &s as parameters for add() and search().

Remove mtx

The mutex is not used, so it can be removed.

Assert that the input is within bounds

It's possible to supply an incorrect character map that returns indices larger than the number of children in a node. Consider adding an assert() statement before trying to index into the array:

assert(TCMap[c] >= 0 && TCMap[c] < TrieNChild);
x = x->p[TCMap[c]];

This will be able to catch bad behavior in debug builds, and should not have any overhead in release builds that define NDEBUG.

\$\endgroup\$
2
  • \$\begingroup\$ Thank you very much for your excellent comments -- much appreciated. I noticed no comments on the atomic compare_exchange and the exchange operations -- do they seem correct to you? (This is the part that is newest to me) Great suggestion about string_view, that's very convenient. Passing in the map is also a significant improvement, thanks. \$\endgroup\$ Commented Apr 13, 2022 at 7:38
  • 1
    \$\begingroup\$ Yes, they do seem correct to me. You only add items to the Trie, and that significantly simplifies things. \$\endgroup\$
    – G. Sliepen
    Commented Apr 13, 2022 at 14:27

Not the answer you're looking for? Browse other questions tagged or ask your own question.