10

According to What is a "side effect?", I know the side effect means changing outside world. But what if a function that changes the outside state during execution, but reverts the state to original state after execution? For example, originally I have a function about physics simulation that doesn't modify the outer variable (copy a new data to do simulation):

int predictNumberOfStoppedMarbles(std::vector<Marble> marbles){
    //some physics simulation that modifies x,y,vx,vy of marbles but not add,remove marbles or change sequence of marbles
    int count=0;
    for(Marble& marble : marbles){
        count += marble.vx==0 && marble.vy==0?1:0;
    }
    return count;
}

However, I found this method is too slow, because it needs to copy all marbles when the function executes, so I modify it as follows, which mutates the exact income data directly:

int predictNumberOfStoppedMarbles(std::vector<Marble>& marbles){
    std::vector<std::vector<float> > originalDataArray;
    for(Marble& marble : marbles){ //backup original x,y,vx,vy
        originalDataArray.push_back({marble.x,marble,y,marble.vx,marble.vy});
    }
    //some physics simulation that modifies x,y,vx,vy of marbles but not add,remove marbles or change sequence of marbles
    int count=0;
    for(Marble& marble : marbles){
        count+= marble.vx==0 && marble.vy==0?1:0;
    }
    for(int i=0;i<marbles.size();i++){ //restore original x,y,vx,vy
        marbles[i].x=originalDataArray[i][0];
        marbles[i].y=originalDataArray[i][1];
        marbles[i].vx=originalDataArray[i][2];
        marbles[i].vy=originalDataArray[i][3];
    }
    return count;
}

now it modifies the outer data source (marbles from outer world) directly during simulation, but after execution, the function reverts the data back. Is the function still considered as "no side effect"?

Note: In real code, the physics engine needs to accept Marble type as parameter, it is not easy to copy or modify the physics logic code that operates from Marble type to float array type, so the solution that modifies the copied array is not suitable for me.

13
  • 15
    Doc Brown has answered this perfectly. As a rule of thumb I'd just add that it's usually best to treat "minor side effects" like "side effects", period. Nasty bugs are usually caused by minor errors (major errors would have been noticed much earlier). Commented Mar 11 at 7:22
  • 14
    I'm surprised that the second one isn't slower, because you're copying all the marble position data twice. However, this is a good example of why game systems sometimes choose "deconstructed" representations where you have "array of X coordinates" and "array of Y coordinates" rather than objects.
    – pjc50
    Commented Mar 11 at 10:38
  • 2
    "Side effect" does not only mean changing the outside world. That's closer to "visible side effect" or "observable behavior". The most appropriate answer to the title question depends to a large degree on why you want to know, because there is a whole cluster of related concepts here and different ones have different implications. So, why does it matter to you whether either version of your function has side effects? Commented Mar 11 at 18:22
  • 2
    Oh wait, this isn't a std::vector of structs, this is a std::vector of std::vector<float>, so each element of the outer std::vector is three pointers to dynamically allocated space for four floats. That's hilariously inefficient, like 24 bytes of pointers and a dynamic allocation for every 16 bytes of floats. Or instead of a struct, you could just have a flat std::vector<float> as the outer array, and use i*4 + 0..3 to access the components of marble i. Commented Mar 11 at 21:23
  • 2
    Or even better, have the physics code modify the copy of the x,y,vx,vy components and then discard them. (It can read marbles[i] for the unchanging parts of the marble data so you don't have to copy it all if there's stuff like colour, radius, mass or density, elasticity, spin, etc.) That would avoid the entire problem of a temporary side effect as well as reducing copying. Commented Mar 11 at 21:30

6 Answers 6

43

Well, at least it is a temporary side effect.

You may notice the difference to the fully side-effect free version when you run your program in a multi-threaded context. Since your case involves performance optimization, I think it is not unlikely you want to utilize multiple threads. Now imagine another thread trying to read data from the marbles vector in parallel while predictNumberOfStoppedMarbles is executing.

Good luck with debugging such a program!

Of course, from a practical point of view, in a single-threaded context, you can treat the optimized version of predictNumberOfStoppedMarbles as if it has no side-effect - when you are 100% sure all exceptions inside the physics simulation are caught (thanks to @GregBurkhardt pointing out that exceptions can cause trouble here).

As an alternative, why not use a combination of your two approaches, like

int predictNumberOfStoppedMarbles(const std::vector<Marble>& marbles){
    std::vector<std::vector<float> > workArray;
    for(Marble& marble : marbles){
       workArray.push_back({marble.x,marble,y,marble.vx,marble.vy});
    }
    // some physics simulation that modifies x,y,vx,vy inside workArray,
    // but leaves marbles unchanged !!!
    int count=0;
    for(size_t i=0;i<workArray.size();i++){
        count+= workArray[2]==0 && workArray[3]==0 ?1:0;
    }
    return count;
}

(For the sake of simplicity, I omitted introducing a struct for x,y,vx,vy, but I guess you get the idea).

10
  • 20
    Even a single-threaded context can get unpredictable. What if predictNumberOfStoppedMarbles throws an exception before it can revert its changes? What if, due to some defect, it exits before reverting data? What if count overflows? Oh boy, would I not want to debug that program. JavaScript is single-threaded yet asynchronous. I know the question references C++, but asynchronous callbacks and events can take even the most obvious-looking single-threaded code down some dark, dark corridors. Commented Mar 11 at 20:51
  • 4
    @GregBurghardt: indeed. Thanks for the hint, I took the freedom to include a remark nto my answer.
    – Doc Brown
    Commented Mar 11 at 21:05
  • 2
    Even when the process is single threaded, a temporary side effect on external data will bite you when the process happens to be killed unaware. By the OOM-killer, for instance. That's why it's generally good practice to never modify a file, but instead write the new state to a temporary file, and then overwrite the files' directory entry in an atomic syscall. Commented Mar 12 at 9:30
  • @GregBurghardt: "What if there's a defect" is not a reasonable approach in C++. Defective code can have Undefined Behavior, and then all bets are off. No amount of defensive programming can deal with that. Defensive C++ code protects against defective input, that's doable.
    – MSalters
    Commented Mar 12 at 10:10
  • 1
    What about if (foo) when I meant to type if (!foo)? I'm not trying to completely negate your point — it's a good one — but there are careless mistakes people can make for which behavior is well-defined. Tracking down those sorts of silly defects in code like this question can become a nightmare. That was my only point. Commented Mar 12 at 11:21
5

It is not side effect free anymore.

Even if you try to cleanup the "temporary modifications", it is a visible side effect. You may get away with it in some situations, of course, but it can cause hard to find concurrency bugs or security issues later.

In fact, if you watched the CPU space, you may have seen the "SPECTRE" vulnerability being discussed and lots of followups to it. That issue crept into the system exactly due to such a temporary modification of global CPU state that leaked out of the system. The designers thought speculative execution was basically side-effect free, but they were wrong.

So if there is any chance, that your "temporary" modifications may be visible to concurrent or later operations, you can cause trouble and bugs.

1
  • The SPECTRE vulnerability is a perfect "nightmare scenario" for this sort of question. Commented Mar 12 at 18:15
3

There are a number of ways the existence of "side effects" can be analysed.

The term is now cemented in the lexicon, and it probably originates from a similar meaning as the term "side channel" (cross-pollinated with the idea in functional programming that function arguments and results are the "main" channel of data flow in functional programs), but it was a poor choice of term because most people already know the word "side effects" from a medical context meaning something adverse and to be avoided, when that is definitely not the case in programming.

In fact, "side effects" in programming - meaning a flow of data occuring other than through function arguments and results - are often the main intended effect of a program, as much as intoxication is the main intended effect of consuming alcohol rather than a side effect of the consumption.

Only disorderly flows of data are undesirable, like alcohol consumption to excess.

That said, a function which changes "outside" data, and then changes it back before its own conclusion, may be interpreted as having "side effects". The question might be whether the effects could be observed - that is, whether any other part of the program (when working normally) could actually read the altered data in the meantime, and whether its execution or results could be affected by it.

There isn't a universal definition of "side effects" that would distinguish the two cases.

It's certainly possible for a function to use non-local storage - that is, storage whose allocation is neither controlled internally by the function nor passed directly as an argument - yet the overall program design could still make it intentionally impossible in practice for there to be any effect outside the function.

However, if you have enough storage and performance available to duplicate the contents of non-local storage, alter it's contents, then set everything back at the end, the real question might be why you don't just operate with the local duplicate, discarding it at the end, and leave the non-local original untouched throughout.

7
  • 1
    To your last paragraph - it can often be the case that the storage required for keeping a "backup" of the changed content is much smaller than would be required for taking a complete duplicate: in particular, if only a few fields in each record are modified, or if only a few records out of the whole dataset are modified, then storing only the original values of the modified fields/records is much cheaper than duplicating the whole lot :)
    – psmears
    Commented Mar 11 at 16:10
  • @psmears, that certainly makes sense.
    – Steve
    Commented Mar 11 at 16:26
  • 3
    "Side effect" is a general term, you know, not specific to medical or programming context. To the limited extent that it has programming-specific meaning, I see every reason to think that that arose as a simple specialization of the well-known term. Also, in C, at least, "side effects" are not very much about function calls or arguments. They are effects on the state of the program or its environment produced by evaluating expressions, other than the computation of the result of the expression. That does include function call expressions, but it's in no way specific to those. Commented Mar 11 at 17:56
  • @JohnBollinger, it's not just the unfortunate conflation with the medical meaning, its the fact that it casts what are often the main effects of normal programming techniques as "side effects". Even when talking about evaluating expressions, we are still basically back to dealing with the same mathematical mentality that arguments and results are the main effects, which is usually incorrect (with C's assignment operator itself being an example - the assignment is the main intended effect, the result is the side effect which is only occasionally useful).
    – Steve
    Commented Mar 11 at 19:38
  • @Steve Result values are not effects at all, main or otherwise. They are mathematically pure, and the mathematical mentality is very useful in reasoning about the results of evaluating an expression. If the evaluation has effects, we need to consider the order of the evaluation and have a more complex mental model for the stateful execution of imperative code. This can account for side effects of the expression evaluation, and we use that term regardless whether they are intended or not.
    – Bergi
    Commented Mar 12 at 2:28
2

Your question is a good one and it reminded me of an interesting feature of Clojure. The term used for this (at least in Clojure) is 'Transients':

If a tree falls in the woods, does it make a sound?

If a pure function mutates some local data in order to produce an immutable return value, is that ok?

The upshot: this is absolutely done in Clojure and I would consider it one of the more 'pure' functional languages in general use.

I am not a functional programming expert and I am wary of making claims related to it, but I'll hazard a claim that you can do this and still be 'pure' with the caveat that you never leak that state to anything else. Well, maybe for debugging but nothing else.

4
  • Modification of local data to produce an immutable return value need not be transient. Something like the string hashCode function in Java will permanently mutate one of two fields the first time it is invoked on any particular instance, but calling hashCode on a string simultaneously on multiple threads will have no ill effect beyond possibly doing needless work.
    – supercat
    Commented Mar 11 at 21:02
  • @supercat Are you talking about transient variables e.g. in Java? The term is 'transients' and I don't think it's related. Maybe check out the link?
    – JimmyJames
    Commented Mar 11 at 21:04
  • 1
    No, I'm talking about lazily computed values which will be written at most once during the lifetime of an object. A different concept from what the OP is asking about, but another example of the more general concept of internal mutation associated with an externally-immutable interface.
    – supercat
    Commented Mar 11 at 21:50
  • @supercat I'm not sure how that relates to my answer, but I think you are talking about a type of memoization.
    – JimmyJames
    Commented Mar 12 at 15:05
0

If there is any possibility that some other code acts on the temporary modified state then it is a side effect. If it is impossible that some other code acts on it then it is no side effect. For example if a resource is protected by a mutex, or if it is marked as invalid while temporary modified.

-1

Can you guarantee that your current function is the only thing touching the data right now? Multithreading exists. Something, someday, may be doing operations on very same data and you just changed the data.
Only if you can guarantee atomicity of the operation, you can consider it side effect free. (Atomicity = until all operations are complete, nothing else can touch the data).
Secondly, you do not guarantee the data to be recovered if exception happens. If something breaks during calculation, the data will be left scrambled.

Not the answer you're looking for? Browse other questions tagged or ask your own question.