Skip to main content
added 30 characters in body; edited title; deleted 48 characters in body; edited tags
Source Link
Mysticial
  • 469.2k
  • 46
  • 337
  • 334

Curious why Why does a preassigned function pointer seems to perform worse than a branch?

I have a class with an enumenum member variable. One of the member functions bases its behavior on this enumenum so as a "possible" optimization, I have the two different behaviors as two different functions and I give the class a member function pointer which is set at construction. I simulated this situation like this:

enum catMode {MODE_A, MODE_B};

struct cat
{
    cat(catMode mode) : stamp_(0), mode_(mode) {}

    void
    update()
        {
            stamp_ = (mode_ == MODE_A) ? funcA() : funcB();
        }

    uint64_t stamp_;
    catMode  mode_;
};

struct cat2
{
    cat2(catMode mode) : stamp_(0), mode_(mode)
        {
            if (mode_ = MODE_A)
                func_ = funcA;
            else
                func_ = funcB;
        }

    void
    update()
        {
            stamp_ = func_();
        }

    uint64_t stamp_;
    catMode  mode_;
    uint64_t (*func_)(void);
};

And then I create a cat object and an array of length 32,32. I traverse the array to bring it into cache, then I call cats update method 3232 times and store the latency using rdtscrdtsc in the array...then

Then I call a function which loops several hundred times using rand()rand(), ulseep()ulseep(), and some arbitrary strcmpstrcmp()..come back and I do the 3232 thing again. 

The result is that the method with the branch seems to always be around 44 plus minus 1044 +/- 10 cycles whereas the one with the function pointer tends to be around 130130. I'm curious as to why this would be the case? 

If anything, I would have expected similar performance. Also, templating is hardly an option because full specialization of the real cat class for that one function would be overkill.

Curious why a preassigned function pointer seems to perform worse than a branch?

I have a class with an enum member variable. One of the member functions bases its behavior on this enum so as a "possible" optimization, I have the two different behaviors as two different functions and I give the class a member function pointer which is set at construction. I simulated this situation like this:

enum catMode {MODE_A, MODE_B};

struct cat
{
    cat(catMode mode) : stamp_(0), mode_(mode) {}

    void
    update()
        {
            stamp_ = (mode_ == MODE_A) ? funcA() : funcB();
        }

    uint64_t stamp_;
    catMode  mode_;
};

struct cat2
{
    cat2(catMode mode) : stamp_(0), mode_(mode)
        {
            if (mode_ = MODE_A)
                func_ = funcA;
            else
                func_ = funcB;
        }

    void
    update()
        {
            stamp_ = func_();
        }

    uint64_t stamp_;
    catMode  mode_;
    uint64_t (*func_)(void);
};

And then I create a cat object and an array of length 32, I traverse the array to bring it into cache, then I call cats update method 32 times and store the latency using rdtsc in the array...then I call a function which loops several hundred times using rand(), ulseep() and some arbitrary strcmp..come back and I do the 32 thing again. The result is that the method with the branch seems to always be around 44 plus minus 10 cycles whereas the one with the function pointer tends to be around 130. I'm curious as to why this would be the case? If anything, I would have expected similar performance. Also, templating is hardly an option because full specialization of the real cat class for that one function would be overkill.

Why does a preassigned function pointer perform worse than a branch?

I have a class with an enum member variable. One of the member functions bases its behavior on this enum so as a "possible" optimization, I have the two different behaviors as two different functions and I give the class a member function pointer which is set at construction. I simulated this situation like this:

enum catMode {MODE_A, MODE_B};

struct cat
{
    cat(catMode mode) : stamp_(0), mode_(mode) {}

    void
    update()
    {
        stamp_ = (mode_ == MODE_A) ? funcA() : funcB();
    }

    uint64_t stamp_;
    catMode  mode_;
};

struct cat2
{
    cat2(catMode mode) : stamp_(0), mode_(mode)
    {
        if (mode_ = MODE_A)
            func_ = funcA;
        else
            func_ = funcB;
    }

    void
    update()
    {
        stamp_ = func_();
    }

    uint64_t stamp_;
    catMode  mode_;
    uint64_t (*func_)(void);
};

And then I create a cat object and an array of length 32. I traverse the array to bring it into cache, then I call cats update method 32 times and store the latency using rdtsc in the array...

Then I call a function which loops several hundred times using rand(), ulseep(), and some arbitrary strcmp()..come back and I do the 32 thing again. 

The result is that the method with the branch seems to always be around 44 +/- 10 cycles whereas the one with the function pointer tends to be around 130. I'm curious as to why this would be the case? 

If anything, I would have expected similar performance. Also, templating is hardly an option because full specialization of the real cat class for that one function would be overkill.

Source Link
Palace Chan
  • 9.1k
  • 11
  • 46
  • 99

Curious why a preassigned function pointer seems to perform worse than a branch?

I have a class with an enum member variable. One of the member functions bases its behavior on this enum so as a "possible" optimization, I have the two different behaviors as two different functions and I give the class a member function pointer which is set at construction. I simulated this situation like this:

enum catMode {MODE_A, MODE_B};

struct cat
{
    cat(catMode mode) : stamp_(0), mode_(mode) {}

    void
    update()
        {
            stamp_ = (mode_ == MODE_A) ? funcA() : funcB();
        }

    uint64_t stamp_;
    catMode  mode_;
};

struct cat2
{
    cat2(catMode mode) : stamp_(0), mode_(mode)
        {
            if (mode_ = MODE_A)
                func_ = funcA;
            else
                func_ = funcB;
        }

    void
    update()
        {
            stamp_ = func_();
        }

    uint64_t stamp_;
    catMode  mode_;
    uint64_t (*func_)(void);
};

And then I create a cat object and an array of length 32, I traverse the array to bring it into cache, then I call cats update method 32 times and store the latency using rdtsc in the array...then I call a function which loops several hundred times using rand(), ulseep() and some arbitrary strcmp..come back and I do the 32 thing again. The result is that the method with the branch seems to always be around 44 plus minus 10 cycles whereas the one with the function pointer tends to be around 130. I'm curious as to why this would be the case? If anything, I would have expected similar performance. Also, templating is hardly an option because full specialization of the real cat class for that one function would be overkill.