110

There seems to be a recent trend in JavaScript towards treating data structures as immutable. For example, if you need to change a single property of an object, better to just create a whole new object with the new property, and just copy over all the other properties from the old object, and let the old object be garbage collected. (That's my understanding anyway.)

My initial reaction is, that sounds like it would be bad for performance.

But then libraries like Immutable.js and Redux.js are written by smarter people than me, and seem to have a strong concern for performance, so it makes me wonder if my understanding of garbage (and its performance impact) is wrong.

Are there performance benefits to immutability I'm missing, and do they outweigh the downsides of creating so much garbage?

14
  • 11
    They have a strong concern for performance in part because immutability (sometimes) has a performance cost, and they want to minimize that performance cost as much as possible. Immutability, all by itself, only has performance benefits in the sense that it makes it easier to write multi-threaded code. Commented Dec 8, 2015 at 15:20
  • 10
    In my experience, performance is only a valid concern for two scenarios - one, when an action is performed 30+ times in one second, and two - when its effects increase with each execution (Windows XP once found a bug where Windows Update time took O(pow(n, 2)) for every update in its history.) Most other code is an immediate response to an event; a click, API request, or similar, and as long as execution time is constant, the cleanup of any number of objects hardly matters.
    – Katana314
    Commented Dec 8, 2015 at 16:23
  • 4
    Also, consider that there exist efficient implementations of immutable data structures. Maybe these are not as efficient as mutable ones, but probably still more efficient than a naive implementation. See e.g. Purely Functional Data Structures by Chris Okasaki
    – Giorgio
    Commented Dec 8, 2015 at 16:32
  • 1
    @Katana314: 30+ times for me would still not be enough to justify worrying about performance. I ported a small CPU emulator I wrote to node.js and node executed the virtual CPU at around 20MHz (that's 20 million times per second). So I'd only worry about performance if I were doing something 1000+ times per second (even then, I wouldn't really worry until I do 1000000 operations per second because I know I can comfortably do more than 10 of them at once).
    – slebetman
    Commented Dec 9, 2015 at 3:00
  • 2
    @RobertHarvey "Immutability, all by itself, only has performance benefits in the sense that it makes it easier to write multi-threaded code." That is not entirely true, immutability allows for very pervasive sharing with no real consequences. Which is very unsafe in a mutable environment. This gives you thinks like O(1) array slicing and O(log n) inserting into a binary tree while still being able to use the old one freely, and another example is tails which takes all the tails of a list tails [1, 2] = [[1, 2], [2], []] only takes O(n) time and space, but is O(n^2) in element count
    – semicolon
    Commented Jan 2, 2017 at 22:22

6 Answers 6

62

For example, if you need to change a single property of an object, better to just create a whole new object with the new property, and just copy over all the other properties from the old object, and let the old object be garbage collected.

Without immutability, you might have to pass an object around between different scopes, and you do not know beforehand if and when the object will be changed. So to avoid unwanted side effects, you start creating a full copy of the object "just in case" and pass that copy around, even if it turns out no property has to be changed at all. That will leave a lot more garbage than in your case.

What this demonstrates is - if you create the right hypothetical scenario, you can prove anything, especially when it comes to performance. My example, however, is not so hypothetical as it might sound. I worked last month on a program where we stumbled over exactly that problem because we initially decided against using an immutable data structure, and hesitated to refactor this later because it did not seem worth the hassle.

So when you look at cases like this one from an older SO post, the answer to your questions becomes probably clear - it depends. For some cases immutability will hurt performance, for some the opposite might be true, for lots of cases it will depend on how smart your implementation is, and for even more cases the difference will be negligible.

A final note: a real world problem you might encounter is you need to decide early for or against immutability for some basic data structures. Then you build a lot of code upon that, and several weeks or months later you will see if the decision was a good or a bad one.

My personal rule of thumb for this situation is:

  • If you design a data structure with only a few attributes based on primitive or other immutable types, try immutability first.
  • If you want to design a data type where arrays with large (or undefined) size, random access and changing contents are involved, use mutability.

For situations between these two extremes, use your judgement. But YMMV.

1
  • 8
    That will leave a lot more garbage than in your case. and to make matters worse, your runtime will probably be unable to detect the pointless duplication, and thus (unlike an expired immutable object no one is using) it will not even be eligible for collection. Commented Dec 8, 2015 at 16:27
42

First of all, your characterization of immutable data structures is imprecise. In general, most of a data structure is not copied, but shared, and only the changed portions are copied. It is called a persistent data structure. Most implementations are able to take advantage of persistent data structures most of the time. The performance is close enough to mutable data structures that functional programmers generally consider it negligible.

Second, I find a lot of people have a fairly inaccurate idea of the typical lifetime of objects in typical imperative programs. Perhaps this is due to the popularity of memory-managed languages. Sit down sometime and really look at how many temporary objects and defensive copies you create compared to truly long-lived data structures. I think you'll be surprised at the ratio.

I've had people remark in functional programming classes I teach about how much garbage an algorithm creates, then I show the typical imperative version of the same algorithm that creates just as much. Just for some reason people don't notice it anymore.

By encouraging sharing and discouraging creating variables until you have a valid value to put in them, immutability tends to encourage cleaner coding practices and longer lived data structures. This often leads to comparable if not lower levels of garbage, depending on your algorithm.

2
  • 10
    "... then I show the typical imperative version of the same algorithm that creates just as much." This. Also, people who are new to this style, and especially if they are new to functional style in general, may initially produce sub-optimal functional implementations.
    – wberry
    Commented Dec 8, 2015 at 22:31
  • 1
    "discouraging creating variables" Isn't that only valid for languages where the default behavior is copy on assignment/implicit construction? In JavaScript, a variable is just an identifier; it's not an object in its own right. It still occupies space somewhere, but that is negligible (especially as most implementations of JavaScript, to my knowledge, still use a stack for function calls, meaning unless you have lots of recursion you'll just end up reusing the same stack space for most temporary variables). Immutability has no relation to that aspect.
    – JAB
    Commented Dec 9, 2015 at 15:48
39

Late-comer to this Q&A with already great answers, but I wanted to intrude as a foreigner used to looking at things from the lower-level standpoint of bits and bytes in memory.

I'm very excited by immutable designs, even coming from a C perspective, and from the perspective of finding new ways to effectively program this beastly hardware we have these days.

Slower/Faster

As to the question of whether it makes things slower, a robotic answer would be yes. At this kind of very technical conceptual level, immutability could only makes things slower. Hardware does best when it's not sporadically allocating memory and can just modify existing memory instead (why we have concepts like temporal locality).

Yet a practical answer is maybe. Performance is still largely a productivity metric in any non-trivial codebase. We typically don't find horrific-to-maintain codebases tripping over race conditions as being the most efficient, even if we disregard the bugs. Efficiency is often a function of elegance and simplicity. The peak of micro-optimizations can somewhat conflict, but those are usually reserved for the smallest and most critical sections of code.

Transforming Immutable Bits and Bytes

Coming from the low-level standpoint, if we x-ray concepts like objects and strings and so forth, at the heart of it is just bits and bytes in various forms of memory with different speed/size characteristics (speed and size of memory hardware typically being mutually exclusive).

enter image description here

The memory hierarchy of the computer likes it when we repeatedly access the same chunk of memory, such as in the above diagram, since it'll keep that frequently-accessed chunk of memory in the fastest form of memory (L1 cache, e.g., which is almost as fast as a register). We might repeatedly access the exact same memory (reusing it multiple times) or repeatedly access different sections of the chunk (ex: looping through the elements in a contiguous chunk which repeatedly accesses various sections of that chunk of memory).

We end up throwing a wrench in that process if modifying this memory ends up wanting to create a whole new memory block on the side, like so:

enter image description here

... in this case, accessing the new memory block could require compulsory page faults and cache misses to move it back into the fastest forms of memory (all the way into a register). That can be a real performance killer.

There are ways to mitigate this, however, using a reserve pool of preallocated memory, already touched.

Big Aggregates

Another conceptual issue that arises from a slightly higher-level view is simply doing unnecessary copies of really big aggregates in bulk.

To avoid an overly complex diagram, let's imagine this simple memory block was somehow expensive (maybe UTF-32 characters on an unbelievably limited hardware).

enter image description here

In this case, if we wanted to replace "HELP" with "KILL" and this memory block was immutable, we would have to create a whole new block in its entirety to make a unique new object, even though only portions of it have changed:

enter image description here

Stretching our imagination quite a bit, this kind of deep copy of everything else just to make one little part unique might be quite expensive (in real-world cases, this memory block would be much, much bigger to pose a problem).

However, in spite of such an expense, this kind of design will tend to be far less prone to human error. Anyone who has worked in a functional language with pure functions can probably appreciate this, and especially in multithreaded cases where we can multithread such code without a care in the world. In general, human programmers tend to trip over state changes, especially ones that cause external side effects to states outside of a current function's scope. Even recovering from an external error (exception) in such a case can be incredibly difficult with mutable external state changes in the mix.

One way to mitigate this redundant copying work is to make these memory blocks into a collection of pointers (or references) to characters, like so:

Apologies, I failed to realize we don't need to make L unique while making the diagram.

Blue indicates shallow copied data.

enter image description here

... unfortunately, this would get incredibly expensive to pay a pointer/reference cost per character. Furthermore, we might scatter the contents of the characters all over the address space and end up paying for it in the form of a boatload of page faults and cache misses, easily making this solution even worse than copying the entire thing in its entirety.

Even if we were careful to allocate these characters contiguously, say the machine can load 8 characters and 8 pointers to a character into a cache line. We end up loading memory like this to traverse the new string:

enter image description here

In this case, we end up requiring 7 different cache lines worth of contiguous memory to be loaded to traverse this string, when ideally we only need 3.

Chunk Up The Data

To mitigate the issue above, we can apply the same basic strategy but at a coarser level of 8 characters, e.g.

enter image description here

The result requires 4 cache lines worth of data (1 for the 3 pointers, and 3 for the characters) to be loaded to traverse this string which is only 1 short of the theoretical optimum.

So that's not bad at all. There's some memory waste but memory is plentiful and using up more doesn't slow things down if the extra memory is just going to be cold data not frequently accessed. It's only for the hot, contiguous data where reduced memory use and speed often go hand-in-hand where we want to fit more memory into a single page or cache line and access it all prior to eviction. This representation is pretty cache-friendly.

Speed

So utilizing a representation like the above can give quite a decent balance of performance. Probably the most performance-critical uses of immutable data structures will take on this nature of modifying chunky pieces of data and making them unique in the process, while shallow copying unmodified pieces. It does also imply some overhead of atomic operations to reference the shallow copied pieces safely in a multithreaded context (possibly with some atomic reference-counting going on).

Yet as long as these chunky pieces of data are represented at a coarse enough level, a lot of this overhead diminishes and is possibly even trivialized, while still giving us the safety and ease of coding and multithreading more functions in a pure form without external side effects.

Keeping New and Old Data

Where I see immutability as potentially most helpful from a performance standpoint (in a practical sense) is when we can be tempted to make whole copies of large data in order to make it unique in a mutable context where the goal is to produce something new from something that already exists in a way where we want to keep both new and old, when we could just be making little bits and pieces of it unique with a careful immutable design.

Example: Undo System

An example of this is an undo system. We might change a small portion of a data structure and want to keep both the original form we can undo towards, and the new form. With this kind of immutable design that only makes small, modified sections of the data structure unique, we can simply store a copy of the old data in an undo entry while only paying the memory cost of the added unique portions data. This provides a very effective balance of productivity (making the implementation of an undo system a piece of cake) and performance.

High-Level Interfaces

Yet something awkward arises with the above case. In a local kind of function context, mutable data is often the easiest and most straightforward to modify. After all, the easiest way to modify an array is often to just loop through it and modify one element at a time. We can end up increasing the intellectual overhead if we had a large number of high-level algorithms to choose from to transform an array and had to pick the appropriate one to ensure that all these chunky shallow copies are made while the parts that are modified are made unique.

Probably the easiest way in those cases is to use mutable buffers locally inside the context of a function (where they typically don't trip us up) which commit changes atomically to the data structure to get a new immutable copy (I believe some languages call these "transients")...

... or we might simply model higher and higher-level transform functions over the data so that we can hide the process of modifying a mutable buffer and committing it to the structure without mutable logic involved. In any case, this is not a widely-explored territory yet, and we have our work cut out if we embrace immutable designs more to come up with meaningful interfaces for how to transform these data structures.

Data Structures

Another thing that arises here is that immutability used in a performance-critical context will probably want data structures to break down to chunky data where the chunks aren't too small in size but also not too big.

Linked lists might want to change quite a bit to accommodate this and turn into unrolled lists. Large, contiguous arrays might turn into an array of pointers into contiguous chunks with modulo indexing for random access.

It potentially changes the way we look at data structures in an interesting way, while pushing the modifying functions of these data structures to resemble a bulkier nature to hide the extra complexity in shallow copying some bits here and making other bits unique there.

Performance

Anyway, this is my little lower-level view on the topic. Theoretically, immutability can have a cost ranging from very large to smaller. But a very theoretical approach doesn't always make applications go fast. It might make them scalable, but real-world speed often requires embracing the more practical mindset.

From a practical perspective, qualities like performance, maintainability and safety tend to turn into one big blur, especially for a very large codebase. While performance in some absolute sense is degraded with immutability, it's hard to argue the benefits it has on productivity and safety (including thread-safety). With an increase to these can often come an increase to practical performance, if only because the developers have more time to tune and optimize their code without being swarmed by bugs.

So I think from this practical sense, immutable data structures might actually aid performance in a lot of cases, as odd as it sounds. An ideal world might seek a mixture of these two: immutable data structures and mutable ones, with the mutable ones typically being very safe to use in a very local scope (ex: local to a function), while the immutable ones can avoid external side effects outright and turn all changes to a data structure into an atomic operation producing a new version with no risk of race conditions.

12

ImmutableJS is actually quite efficient. If we take an example:

var x = {
    Foo: 1,
    Bar: { Baz: 2 }
    Qux: { AnotherVal: 3 }
}

If the above object is made immutable and you modify the value of the Baz property, what you would get is:

var y = x.setIn('/Bar/Baz', 3);
y !== x; // Different object instance
y.Bar !== x.Bar // As the Baz property was changed, the Bar object is a diff instance
y.Qux === x.Qux // Qux is the same object instance

This creates some really cool performance improvements for deep object models, where you only need to copy value types on objects on the path to the root. The larger the object model and the smaller the changes you make, the better the memory and CPU performance of the immutable data structure as they end up sharing lots of objects.

As the other answers have said, if you contrast this to trying to provide the same guarantees by defensively copying x before passing it into a function that could manipulate it then the performance is significantly better.

5

To add on to this (already excellently answered) question:

The short answer is yes; it will hurt performance because you're only ever creating objects instead of mutating existing ones, resulting in more object creation overhead.


However, the long answer is not really.

From an actual runtime point of view, in JavaScript you already create quite a lot of runtime objects - functions and object literals are everywhere in JavaScript and no-one seems to think twice about using those. I would argue that object creation is actually quite cheap, though I have no citations for this so I wouldn't use it as a stand-alone argument.

For me, the biggest 'performance' increase isn't in runtime performance but in developer performance. One of the first things I learned while working on Real World (tm) applications is that mutability is really dangerous and confusing. I've lost many hours chasing down a thread (not the concurrency type) of execution trying to work out what is causing an obscure bug when it turns out to be a mutation from the other side of the damn application!

Using immutability makes things a lot easier to reason about. You're able to know immediately that X object is not going to change during its lifetime, and the only way for it to change is to clone it. I value this a lot more (especially in team environments) than any micro-optimisations that mutability might bring.

There are exceptions, most notably data structures as noted above. I've rarely come across a scenario where I have wanted to alter a map after creation (although admittedly I am talking about pseudo-object-literal maps rather than ES6 maps), same for arrays. When you're dealing with larger data structures, mutability might pay off. Remember that every object in JavaScript is passed as a reference rather than a value.


That said, one point brought up above was the GC and its inability to detect duplications. This is a legitimate concern, but in my opinion it is only a concern when memory is a concern, and there are far easier ways to code yourself into a corner - for example, circular references in closures.


Ultimately, I would prefer to have an immutable codebase with very few (if any) mutable sections and be slightly less performant than have mutability everywhere. You can always optimise later on if immutability, for some reason, becomes a concern for performance.

4

In a straight line, immutable code has the overhead of object creation, which is slower. However, there are a lot of situations where mutable code becomes very difficult to manage efficiently (resulting in a lot of defensive copying, which is expensive too), and there are a lot of clever strategies for mitigating the cost of 'copying' an object, as mentioned by others.

If you have an object such as a counter, and it's getting incremented many times a second, having that counter be immutable might not be worth the performance penalty. If you have an object that's being read by many different parts of your application, and each of them want to have their own slightly different clone of the object, you'll have a far easier time orchestrating that in a performant manner by using a good immutable object implementation.

Not the answer you're looking for? Browse other questions tagged or ask your own question.