82

Today i was reading about pure function, got confused with its use:

A function is said to be pure if it returns same set of values for same set of inputs and does not have any observable side effects.

e.g. strlen() is a pure function while rand() is an impure one.

__attribute__ ((pure)) int fun(int i)
{
    return i*i;
}

int main()
{
    int i=10;
    printf("%d",fun(i));//outputs 100
    return 0;
}

http://ideone.com/33XJU

The above program behaves in the same way as in the absence of pure declaration.

What are the benefits of declaring a function as pure[if there is no change in output]?

20
  • 7
    Yes - look at the generated assembly. Commented Jun 22, 2012 at 9:47
  • 4
    I don't think this definition of purity is correct - printf, for example, would qualify (calling it twice with the same arguments yields the same return value), but it is not pure.
    – tdammers
    Commented Jun 22, 2012 at 11:51
  • 14
    @tdammers: Indeed, it lacks the ...and no side-effects... part. Commented Jun 22, 2012 at 12:15
  • 2
    @Ben: where does the entropy come from? We're dealing with (theoretically) deterministic machines here, the only way of getting true entropy into them is from external sources, which means side effects. Of course, we could allow programming languages to define nondeterministic functions, pretending the technical side effects aren't there and the functions really are nondeterministic; but if we do that, most of the practical benefits of tracking purity are lost.
    – tdammers
    Commented Jun 22, 2012 at 13:50
  • 3
    tdammers is correct - the definition of pure given above is incorrect. Pure means that the output depends only on the inputs to the function; also, there must be no observable side-effects. "Same output for same input" is a very inaccurate summary of those requirements. en.wikipedia.org/wiki/Pure_function
    – Dancrumb
    Commented Jun 22, 2012 at 14:04

6 Answers 6

146

pure lets the compiler know that it can make certain optimisations about the function: imagine a bit of code like

for (int i = 0; i < 1000; i++)
{
    printf("%d", fun(10));
}

With a pure function, the compiler can know that it needs to evaluate fun(10) once and once only, rather than 1000 times. For a complex function, that's a big win.

5
  • Ie, you can safely use memoization Commented Jun 22, 2012 at 17:48
  • @mob What do you mean? Why not? Commented Jun 22, 2012 at 18:17
  • 15
    Because you can modify the string (sequence of chars starting from some address) without modifying the input (the pointer to the address where the string starts), i.e., you can't memoize it. It would only be a pure function in a language with immutable strings (Java, say).
    – mob
    Commented Jun 22, 2012 at 19:35
  • 5
    @KonradRudolph: Imagine a length 1000 string. Call strlen on it. Then again. Same thing yes? Now modify the second character to be \0. Does strlen still return 1000 now? The starting address is the same (== input is same) but the function now returns a different value. Commented Jun 23, 2012 at 16:10
  • 5
    @mob That’s a good objection, obviously you are right. I was misled by the fact that even books claim that strlen (in GCC / glibc) is in fact pure. But a look at the glibc implementation showed this to be wrong. Commented Jun 23, 2012 at 16:44
34

When you say a function is 'pure' you are guaranteeing that it has no externally visible side-effects (and as a comment says, if you lie, bad things can happen). Knowing that a function is 'pure' has benefits for the compiler, which can use this knowledge to do certain optimizations.

Here is what the GCC documentation says about the pure attribute:

pure

Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure. For example,

          int square (int) __attribute__ ((pure));

Philip's answer already shows how knowing a function is 'pure' can help with loop optimizations.

Here is one for common sub-expression elimination (given foo is pure):

a = foo (99) * x + y;
b = foo (99) * x + z;

Can become:

_tmp = foo (99) * x;
a = _tmp + y;
b = _tmp + z;
5
  • 3
    I am not sure if any do this, but pure functions also allow the compiler to reorder when the function gets called, should it deem reordering would be beneficial. When there is the possibility for side effects, the compiler needs to be more conservative.
    – mpdonadio
    Commented Jun 22, 2012 at 17:12
  • @MPD - Yes, that sounds reasonable. And since a call instruction is a bottleneck for superscalar CPUs, some help from the compiler can help. Commented Jun 22, 2012 at 17:50
  • I vaguely recall using a DSP compiler a number of years ago that would use this technique to get return values sooner/later. This allowed it to minimize pipeline stalls.
    – mpdonadio
    Commented Jun 22, 2012 at 18:53
  • 1
    Could "foo(99)" be precalculated since 99 is a const and foo will always return the same result? Maybe in some sort of two-stage compile?
    – markwatson
    Commented Jun 25, 2012 at 20:41
  • 1
    @markwatson - I am not sure. There can be cases when it is simply not possible. e.g. if foo is part of another compilation unit (another C file), or in a pre-compiled library. In both cases, the compiler wont know what foo does, and cannot pre-calculate. Commented Jun 25, 2012 at 23:08
29

In addition to possible run-time benefits, a pure function is much easier to reason about when reading code. Furthermore, it's much easier to test a pure function since you know that the return value only depends on the values of the parameters.

1
  • 3
    +1, your point about testing is an interesting one. No set-up and tear-down required. Commented Jun 22, 2012 at 9:59
15

A non-pure function

int foo(int x, int y) // possible side-effects

is like an extension of a pure function

int bar(int x, int y) // guaranteed no side-effects

in which you have, besides the explicit function arguments x, y, the rest of the universe (or anything your computer can communicate with) as an implicit potential input. Likewise, besides the explicit integer return value, anything your computer can write to is implicitly part of the return value.

It should be clear why it is much easier to reason about a pure function than a non-pure one.

2
  • 1
    +1: Using the universe as a potential input is a very nice way of explaining the difference between pure and not-pure. Commented Jun 22, 2012 at 16:52
  • indeed, this is the idea behind monads. Commented Oct 7, 2012 at 19:26
7

Just as an add-on, I would like to mention that C++11 codifies things somewhat using the constexpr keyword. Example:

#include <iostream>
#include <cstring>

constexpr unsigned static_strlen(const char * str, unsigned offset = 0) {
        return (*str == '\0') ? offset : static_strlen(str + 1, offset + 1);
}

constexpr const char * str = "asdfjkl;";

constexpr unsigned len = static_strlen(str); //MUST be evaluated at compile time
//so, for example, this: int arr[len]; is legal, as len is a constant.

int main() {
    std::cout << len << std::endl << std::strlen(str) << std::endl;
    return 0;
}

The restrictions on the usage of constexpr make it so that the function is provably pure. This way, the compiler can more aggressively optimize (just make sure you use tail recursion, please!) and evaluate the function at compile time instead of run time.

So, to answer your question, is that if you're using C++ (I know you said C, but they are related), writing a pure function in the correct style allows the compiler to do all sorts of cool things with the function :-)

0
4

In general, Pure functions has 3 advantages over impure functions that the compiler can take advantage of:

Caching

Lets say that you have pure function f that is being called 100000 times, since it is deterministic and depends only on its parameters, the compiler can calculate its value once and use it when necessary

Parallelism

Pure functions don't read or write to any shared memory, and therefore can run in separate threads without any unexpected consequence

Passing By Reference

A function f(struct t) gets its argument t by value, and on the other hand, the compiler can pass t by reference to f if it is declared as pure while guaranteeing that the value of t will not change and have performance gains


In addition to the compile time considerations, pure functions can be tested fairly easy: just call them.

No need to construct objects or mock connections to DBs / file system.

Not the answer you're looking for? Browse other questions tagged or ask your own question.