Add fast path to `deepcopy()` for empty list/tuple/dict/set #121192

lgeiger · 2024-06-30T21:54:34Z

deepcopy() can be surprisingly slow when called with empty containers like lists, tuples, dicts, sets or frozensets.

Adding a fast path for this case similar to #114266 would significantly speed up such cases by about 4x - 28x while having little impact on the default path and not adding too much complexity. With such a patch the following benchmarking script would show a significant speedup compared to main:

import pyperf
runner = pyperf.Runner()

setup = """
import copy

a = {"list": [1, 2 ,3 ,4], "t": (1, 2, 3), "str": "hello", "subdict": {"a": True}}

t = ()
fs = frozenset()
l = []
s = set()
d = {}
deep = [[], (), {}, set(), frozenset()]
"""

runner.timeit(name="deepcopy dict", stmt=f"b = copy.deepcopy(a)", setup=setup)
runner.timeit(name="deepcopy empty tuple", stmt=f"b = copy.deepcopy(t)", setup=setup)
runner.timeit(name="deepcopy empty frozenset", stmt=f"b = copy.deepcopy(fs)", setup=setup)
runner.timeit(name="deepcopy empty list", stmt=f"b = copy.deepcopy(l)", setup=setup)
runner.timeit(name="deepcopy empty set", stmt=f"b = copy.deepcopy(s)", setup=setup)
runner.timeit(name="deepcopy empty dict", stmt=f"b = copy.deepcopy(d)", setup=setup)
runner.timeit(name="deepcopy multiple empty containers", stmt=f"b = copy.deepcopy(deep)", setup=setup)

deepcopy dict: Mean +- std dev: [baseline] 1.86 us +- 0.06 us -> [optimize-empty-copy] 2.02 us +- 0.02 us: 1.09x slower
deepcopy empty tuple: Mean +- std dev: [baseline] 285 ns +- 2 ns -> [optimize-empty-copy] 48.4 ns +- 0.9 ns: 5.89x faster
deepcopy empty frozenset: Mean +- std dev: [baseline] 1.47 us +- 0.11 us -> [optimize-empty-copy] 49.9 ns +- 1.5 ns: 29.44x faster
deepcopy empty list: Mean +- std dev: [baseline] 323 ns +- 2 ns -> [optimize-empty-copy] 82.7 ns +- 2.5 ns: 3.91x faster
deepcopy empty set: Mean +- std dev: [baseline] 1.46 us +- 0.10 us -> [optimize-empty-copy] 85.4 ns +- 4.9 ns: 17.04x faster
deepcopy empty dict: Mean +- std dev: [baseline] 326 ns +- 4 ns -> [optimize-empty-copy] 83.3 ns +- 2.6 ns: 3.91x faster
deepcopy multiple empty containers: Mean +- std dev: [baseline] 4.13 us +- 0.04 us -> [optimize-empty-copy] 1.16 us +- 0.02 us: 3.56x faster

Geometric mean: 5.48x faster

This might conflict with @eendebakpt efforts in #91610 or could be something that should be added to the proposed C version as well.

For context, I noticed this when using pydantic models with mutable default values where pydantic would deep copy the default value upon class instantiation. E.g.:

class Foo(pydantic.BaseModel):
    bar: list[int] = []

To be fair the proper fix in this case would be not to use a mutable default value in pydantic and switch to pydantic.Field(default_factory=list) similar to dataclasses instead which is much faster. But potentially there might be other scenarios where deepcopying empty iterables might be common.

I'm happy to make a PR unless it conflicts with the efforts going on in #91610.

Linked PRs

gh-121192: Add fast path to deepcopy() for empty list/tuple/dict/set #121193

The text was updated successfully, but these errors were encountered:

eendebakpt · 2024-06-30T22:09:01Z

@lgeiger No worries about conflicts with my PR, I can rebase if needed. I will add benchmarks for empty containers as well (I never imagined they would actually matter).

More important is perhaps whether your patch slows down the other cases. Do you have numbers for that? And perhaps also a benchmark for the pydantic models.

lgeiger · 2024-06-30T23:06:56Z

More important is perhaps whether your patch slows down the other cases. Do you have numbers for that?

I extended the above benchmarks to add a case which doesn't include any empty iterables. It does make this case about 1.09x slower. I'm not sure what kind of noise we'd expect from such a microbenchmark though, since I'm currently just running it on an M3 macbook on battery only.

And perhaps also a benchmark for the pydantic models.

Unfortunately I can't add a proper benchmark for the pydantic case, since it doesn't built with 3.14 yet. But here's a comparison of the example code with 3.12 which should illustrate the problem:

import pyperf
runner = pyperf.Runner()

setup = """
from pydantic import BaseModel, Field

class Foo(pydantic.BaseModel):
    bar: list[int] = []
    baz: dict[str, int] = {}

class FooField(pydantic.BaseModel):
    bar: list[int] = Field(default_factory=list)
    baz: dict[str, int] = Field(default_factory=dict)
"""

runner.timeit(name="init pydantic defaults", stmt=f"Foo()", setup=setup)
runner.timeit(name="init pydantic default factory", stmt=f"FooField()", setup=setup)

init pydantic defaults: Mean +- std dev: 1.19 us +- 0.01 us
init pydantic default factory: Mean +- std dev: 351 ns +- 12 ns

Only profiling the Foo() calls yields the following flamegraph whereas FooField() doesn't have a need to call deepcopy():

lgeiger · 2024-06-30T23:21:20Z

I could reduce the slowdown of the default case to 1.06x with a slightly simpler version of the patch that actually performs even better for empty lists, sets and dicts but is a bit slower for tuples and frozen sets (although still much faster than the baseline). Let me know if you'd like me to open a PR, maybe then it's easier to discuss with the concrete code.

+------------------------------------+----------+------------------------+--------------------------+
| Benchmark                          | baseline | optimize-empty-copy    | less-optimize-empty-copy |
+====================================+==========+========================+==========================+
| deepcopy dict                      | 1.86 us  | 2.01 us: 1.09x slower  | 1.97 us: 1.06x slower    |
+------------------------------------+----------+------------------------+--------------------------+
| deepcopy empty tuple               | 285 ns   | 48.7 ns: 5.85x faster  | 55.0 ns: 5.18x faster    |
+------------------------------------+----------+------------------------+--------------------------+
| deepcopy empty frozenset           | 1.47 us  | 49.7 ns: 29.55x faster | 73.8 ns: 19.92x faster   |
+------------------------------------+----------+------------------------+--------------------------+
| deepcopy empty list                | 323 ns   | 81.8 ns: 3.95x faster  | 75.1 ns: 4.30x faster    |
+------------------------------------+----------+------------------------+--------------------------+
| deepcopy empty set                 | 1.46 us  | 84.7 ns: 17.18x faster | 77.6 ns: 18.75x faster   |
+------------------------------------+----------+------------------------+--------------------------+
| deepcopy empty dict                | 326 ns   | 84.8 ns: 3.84x faster  | 78.0 ns: 4.18x faster    |
+------------------------------------+----------+------------------------+--------------------------+
| deepcopy multiple empty containers | 4.13 us  | 1.17 us: 3.51x faster  | 1.32 us: 3.14x faster    |
+------------------------------------+----------+------------------------+--------------------------+
| Geometric mean                     | (ref)    | 5.47x faster           | 5.20x faster             |
+------------------------------------+----------+------------------------+--------------------------+

pochmann3 · 2024-07-01T01:03:16Z

Why check len(x) == 0 instead of not x?

lgeiger · 2024-07-01T01:20:47Z

Why check len(x) == 0 instead of not x?

Good idea, changed in ba7298c. This makes it slightly faster as well. In my quick benchmark the PR would make the default case only 1.03x slower (see updated numbers in the PR)

sobolevn · 2024-07-01T07:40:40Z

I think that from 4x to 20x speed up for this quite common case is a good enough reason to slow down for 1.09x in the slow path.

I am +1 on this.

corona10 · 2024-07-01T20:50:14Z

cc @serhiy-storchaka

corona10 · 2024-07-01T20:53:27Z

@sobolevn Out of curiosity, what is the typical case for deep copy? Deep cooy for non empty containers or Deepcopy for empty containers? I think the slowdown is not cheap if the nonempty container is more frequently called in the real world.

lgeiger · 2024-07-02T19:41:49Z

I think the slowdown is not cheap if the nonempty container is more frequently called in the real world.

@corona10 I updated #121193 and I don't see any measurable performance regression in the common case anymore (for more info see the updated numbers in the PR description).

bedevere-app bot mentioned this issue Jul 1, 2024

gh-121192: Add fast path to deepcopy() for empty list/tuple/dict/set #121193

Open

hugovk added performance Performance or resource usage stdlib Python modules in the Lib dir labels Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fast path to `deepcopy()` for empty list/tuple/dict/set #121192

Add fast path to `deepcopy()` for empty list/tuple/dict/set #121192

lgeiger commented Jun 30, 2024 •

edited by bedevere-app bot

Loading

eendebakpt commented Jun 30, 2024

lgeiger commented Jun 30, 2024

lgeiger commented Jun 30, 2024

pochmann3 commented Jul 1, 2024

lgeiger commented Jul 1, 2024

sobolevn commented Jul 1, 2024 •

edited

Loading

corona10 commented Jul 1, 2024

corona10 commented Jul 1, 2024 •

edited

Loading

lgeiger commented Jul 2, 2024 •

edited

Loading

Add fast path to deepcopy() for empty list/tuple/dict/set #121192

Add fast path to deepcopy() for empty list/tuple/dict/set #121192

Comments

lgeiger commented Jun 30, 2024 • edited by bedevere-app bot Loading

Linked PRs

eendebakpt commented Jun 30, 2024

lgeiger commented Jun 30, 2024

lgeiger commented Jun 30, 2024

pochmann3 commented Jul 1, 2024

lgeiger commented Jul 1, 2024

sobolevn commented Jul 1, 2024 • edited Loading

corona10 commented Jul 1, 2024

corona10 commented Jul 1, 2024 • edited Loading

lgeiger commented Jul 2, 2024 • edited Loading

Add fast path to `deepcopy()` for empty list/tuple/dict/set #121192

Add fast path to `deepcopy()` for empty list/tuple/dict/set #121192

lgeiger commented Jun 30, 2024 •

edited by bedevere-app bot

Loading

sobolevn commented Jul 1, 2024 •

edited

Loading

corona10 commented Jul 1, 2024 •

edited

Loading

lgeiger commented Jul 2, 2024 •

edited

Loading