14
\$\begingroup\$

I just wrote a simple B-Tree insertion implementation, and was wondering if anyone can comment / critique on code style, readability, maintainability etc. I tested it with a few small test cases and as far as I know, there are no obvious bugs in this.

I'm making extensive use of shared_ptr. I contemplated using unique_ptr, but since I'm maintaining parent pointers in every node, I settled with shared_ptr. I'm deriving the Node class from enable_shared_from_this because I want to be able to construct a shared_ptr on this pointer from within a function of the Node class. Is this a good pattern?

These are some ideas I got while writing this code (this may help you with the review):

  1. A leaf node is always split into leaf nodes, similarly internal nodes that are split give more internal nodes.
  2. A node with parent pointer null can be treated as root.
  3. The SplitNode() function returns the sibling (a new node) and the median. Whenever a node is split, the median moves up. But all the children stay at the same level and get divided between the two nodes.
  4. PropagateUp() receives the median key from one of its children and points to the split children.
  5. Finally, I'm always inserting the new key into the node, and then splitting the Node if it exceeds Max. I found this to be easier to implement than to detect if the node is going to be full because of the incoming key, and then calculate the median without inserting - which could either be the new incoming value, or from the existing keys.

Since there are so many shared_ptrs, the performance must suck. Are there any suggestions on that front?

#include <iostream>
#include <memory>
#include <vector>
#include <cassert>
#include <algorithm>

int MAXKEYS = 5; // we split at max keys. 
struct Node : std::enable_shared_from_this<Node>
{
    Node(bool isLeaf = false)
    : _isLeaf(isLeaf),
      _parent(nullptr)
    {   
    }

    void splitNode(std::shared_ptr<Node>& siblingNode /*outParam*/, int& median /*outParam*/)
    {
        size_t medianOffset = MAXKEYS / 2;
        median = _keys[medianOffset];

        std::cout << "Splitting Node, New Median: " << median << " medianOffset: " << medianOffset << " isLeaf: " << (_isLeaf == true ? "true" : "false") << std::endl;
        siblingNode.reset(new Node(_isLeaf));

        // set sibling node's parent pointer
        siblingNode->_parent = _parent;

        if (!_isLeaf)
        {
            siblingNode->_children.assign(_children.begin() + medianOffset + 1, _children.end());
            _children.erase(_children.begin() + medianOffset + 1, _children.end());
        }

        // assign the keys after median to sibling.
        (siblingNode->_keys).assign(_keys.begin() + medianOffset + 1, _keys.end());
        // erase the keys median onwards
        _keys.erase(_keys.begin() + medianOffset, _keys.end());
    }

    // called when the node is full.
    Node* propagateUp(int median, std::shared_ptr<Node>& siblingNode, bool& isHeightIncreased)
    {
        // check if tree height to be increased.
        if (_parent == nullptr)
        {    
            // this is the root, and height of tree needs to increase by 1.
            Node* newRoot = new Node(false);

            // set children for new root
            (newRoot->_children).push_back(shared_from_this());
            (newRoot->_children).push_back(siblingNode);   
            (newRoot->_keys).push_back(median);

            // set parent pointers
            _parent = newRoot;
            siblingNode->_parent = newRoot;

            isHeightIncreased = true;
            return newRoot;
        }
        else
        {
            return _parent->insert(median, this, siblingNode, isHeightIncreased);
        }     
    }

    // invoked by a child of this node.
    Node* insert(int median, Node* oldChild, std::shared_ptr<Node> rightSiblingOfChildAfterSplit, bool& isHeightIncreased)
    {
        assert(!_isLeaf);
        assert(_keys.size() > 0);

        auto iter = std::upper_bound(_keys.begin(), _keys.end(), median); 
        auto position = std::distance(_keys.begin(), iter);

        _keys.insert(iter, median);

        _children.insert(_children.begin() + position + 1, rightSiblingOfChildAfterSplit);
        assert(_children[position].get() == oldChild);  
        // the +1 is important because the original child node is already present, and the new child will come after the existing child.     

        // Now check if the node size was max, then split, and insert recursively.
        // check if node is full
        if (_keys.size() == MAXKEYS)
        {
            std::shared_ptr<Node> siblingNode;
            int median;
            splitNode(siblingNode, median);
            return propagateUp(median, siblingNode, isHeightIncreased);         
        }
        else
        {
            return this;
        }
    }

    Node* find(int val)
    {
        if (_isLeaf)
        {
            return this;
        }
        else
        {
            // call find on a correct child.
            auto position = std::distance(_keys.begin(), std::upper_bound(_keys.begin(), _keys.end(), val));
            return _children[position]->find(val);
        }
    }

    // this function is called only on the leaf node.
    // returns the highest node into which an element was inserted in the tree.
    Node* insert(int val, bool& isHeightIncreased)
    {
        std::cout << "Inserting element " << val << std::endl;
        assert(_isLeaf);
        if (_keys.size() == 0)
        {
            _keys.push_back(val);
            return this;        
        }

        assert(_keys.size() < MAXKEYS);

        // insert at proper position.
        auto iter = std::upper_bound(_keys.begin(), _keys.end(), val);
        _keys.insert(iter, val);

        // check if node is full
        if (_keys.size() == MAXKEYS)
        {
            std::shared_ptr<Node> siblingNode;
            int median;
            splitNode(siblingNode, median);
            return propagateUp(median, siblingNode, isHeightIncreased); 
        }
        else
        {
            return this;
        }
    }

    void printNode(bool recursive = false)
    {
        std::cout << "--------Printing Node:------" << std::endl;
        std::cout << "Number of Keys: " << _keys.size() << "    " << "Number of Children " << _children.size() << std::endl;

        std::cout << "Node Keys: ";

        if (_isLeaf)
        {
            std::cout << "LEAF NODE" << std::endl;            
            for (const auto& key : _keys) { std::cout << key << "   " << std::endl; }
            std::cout << "---------End Printing Node ----------" << std::endl;
            return;
        }
        else
        {
            std::cout << "INTERNAL NODE" << std::endl;
            for (const auto& key : _keys) { std::cout << key << "   " << std::endl; }
            std::cout << "---------End Printing Node ----------" << std::endl;
        }

        if (recursive)
        {        
            for(const auto& child : _children)
            {
                child->printNode();
            }
        }
    }    

    bool _isLeaf;
    Node* _parent;
    std::vector<int> _keys;
    std::vector<std::shared_ptr<Node>> _children;
};

struct Tree
{
    Tree()
    {
        _root.reset(new Node(true));
    }

    void print()
    {
        _root->printNode();
        // print in depth first order
        for(const auto& child : _root->_children)
        {
            child->printNode(true /*printRecursive*/);
        }
    }

    void insert(int val)
    {
        Node* leafNode = _root->find(val);
        bool isHeightIncreased = false;
        Node* insertNode = leafNode->insert(val, isHeightIncreased);

        if (isHeightIncreased)
        {
            _root.reset(insertNode);
        }

        std::cout << std::endl;
        std::cout << "--------------printing tree--------------" << std::endl;
        print();
        std::cout << "---------------end print tree-----------" << std::endl;
        std::cout << std::endl;
    }

    std::shared_ptr<Node> _root;
};

int main()
{

    std::shared_ptr<Tree> tree(new Tree());
    tree->insert(20);
    tree->insert(30);
    tree->insert(40);
    tree->insert(50);
    tree->insert(60);

    tree->insert(10);
    tree->insert(12);
    tree->insert(14);
    tree->insert(16);

    tree->insert(22);
    tree->insert(24);
    tree->insert(26);
    tree->insert(28);
    tree->insert(29);
    tree->insert(4);
    tree->insert(6);
    tree->insert(8);
    std::cout << "Hello, world!\n";
}
\$\endgroup\$
3
  • \$\begingroup\$ Just as a note: FYI: std::map implements has the same characteristics as a tree! Thus most implementations use a red/black tree to implement std::map. \$\endgroup\$ Commented Oct 1, 2015 at 6:34
  • \$\begingroup\$ This is just stunningly wrong std::shared_ptr<Tree> tree(new Tree()); But I will do a full review in the morning. Why not just Tree tree; \$\endgroup\$ Commented Oct 1, 2015 at 6:37
  • \$\begingroup\$ thanks for the tip @LokiAstari yes, the Tree object could be allocated on the stack - but apart from that semantic, is this a correctness issue or anti-c++? \$\endgroup\$ Commented Oct 1, 2015 at 18:36

2 Answers 2

3
\$\begingroup\$

I'm just going to look at the code, not the correctness of the algorithm.

On the plus side:

  1. You have used a consistent system for labelling functions and variables.
  2. Your code is clean and well laid out, its very easy to read.
  3. You have been consistent with your use of braces.

And now the negative:

  1. The comments need improvements, saying when a function is called is OK, but why don't you say what it does and why too.
  2. Why are you using structs instead of classes.
  3. MAXKEYS - This looks like a constant, but it isn't coded as such.
    • Its a signed value being assigned to an unsigned value.
    • Its an odd number being divided by 2 and then integer rounded.
    • Why divide by two when you could bitshift?
  4. If parameters to functions are not being modified then the intention of the author is clearer when you make them const.
  5. If functions don't modify class data then make them const.
  6. medianOffset should be const. It makes it clearer that you don't intend it to change.
  7. Use typedef to define the structure that holds the data (std::shared_ptr<>) that way you can change it by just editing one place.
  8. You need to program more defensively, you need to check the parameters to public functions are acceptable before you start processing.
  9. Have you considered what happens if '_children' is empty when you call '_children.begin()' in splitNode.

If there is anything I've not explained enough please shout. Hope that helps.

\$\endgroup\$
2
  • \$\begingroup\$ Thanks, this is awesome! For 9), whenever I call _children.begin(), the node is guaranteed to have at least one child (because it is not a leaf). How do I make this invariant clear in code? I will wait for any other responses before accepting your answer. \$\endgroup\$ Commented Oct 1, 2015 at 18:37
  • \$\begingroup\$ My pleasure. :) For 9) I would just add a comment that says that. Comments are much easier to write and read than code. \$\endgroup\$ Commented Oct 2, 2015 at 10:42
1
\$\begingroup\$

I would like to add something.

Comments

I am always against commenting code. If you need comments that might suggest bad naming. Additionally commenting stuff that is obvious is meaningless.

Prefer constexpr

You have a value which should be at least const but I would make this constexpr

constexpr int MAXKEYS = 5

C.48: Prefer in-class initializers to member initializers in constructors for constant initializers

Node(bool isLeaf = false)
: _isLeaf(isLeaf),
  _parent(nullptr)
{   
}

It could be just

Node(bool isLeaf): _isLeaf(isLeaf) {}

bool _isLeaf {false};
Node* _parent {nullptr};

Naming convention

You are using somehow popular naming convention with _. _isLeaf Personally, I do not like this kind of ornamentation. (but it's my opinion)

Output parameters

You are passing a lot of parameters by reference as output parameters. If it possible I would avoid this. I find it much more easier to understand return values than output parameters.

Using new

In modern C++ code you probably will never see new

std::shared_ptr<Tree> tree(new Tree());

Should be done this way

std::shared_ptr<Tree> tree = std::make_shared<Tree>();

Matter of taste

I personally use and, or instead of &&, || but it is a matter of taste.

if (!_isLeaf)

becomes

if (not _isLeaf)
\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.