pickle.load() significantly slows down with each call on Windows11 but not on Linux #117545

msspad · 2024-04-04T14:32:30Z

Bug report

Bug description:

My task is to read millions of numpy images and get a region dynamically. The application dictates that the images are stored in batches of about 3000 to 6000 in files. These files contain a pickled dict of numpy arrays. On Windows, reading gets dramatically slower with each call, see logs below. These logs are run on two laptops with completely identical hardware, including 40GB of RAM. While Windows is much slower anyways, it should not get slower with each call?

import os
import pickle
import time
import platform
import sys

import numpy as np
import psutil

filePath = r'C:\images.pkl'

print(f"Versions: {platform.system()=}, {platform.release()=}, {platform.version()=}, {sys.version=}, {np.__version__=}")

imagesDict = {i: np.random.randint(0, 255, (300, 300), dtype=np.uint8) for i in range(4000)}
with open(filePath, 'wb') as file:
    pickle.dump(imagesDict, file, pickle.HIGHEST_PROTOCOL)


thumbs = []
num_image_sets = 0
durations_s_sum = 0.
for i in range(500):
    start_s = time.perf_counter()
    with open(filePath, 'rb') as file:
        imagesDict: dict[int, np.ndarray] = pickle.load(file)
        for key in imagesDict.keys():
            image = imagesDict[key]
            thumb = image[:50, :50].copy()
            thumbs.append(thumb)

    durations_s_sum += (time.perf_counter() - start_s)
    num_image_sets += 1
    if 50 <= num_image_sets:
        memory_info = psutil.Process(os.getpid()).memory_info()
        print(f"{durations_s_sum:4.1f}s for 50 pickle files, rss={memory_info.rss/1024/1024:6,.0f}MB, vms={memory_info.vms/1024/1024:6,.0f}MB")
        durations_s_sum = 0.
        num_image_sets = 0

Windows 11

# Versions: platform.system()='Windows', platform.release()='10', platform.version()='10.0.22631', sys.version='3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)]', np.__version__='1.26.3'
# 11.7s for 50 pickle files, rss= 1,211MB, vms= 1,215MB
# 11.8s for 50 pickle files, rss= 1,492MB, vms= 1,499MB
# 13.8s for 50 pickle files, rss= 2,272MB, vms= 2,302MB
# 15.7s for 50 pickle files, rss= 2,802MB, vms= 2,845MB
# 18.3s for 50 pickle files, rss= 3,328MB, vms= 3,383MB
# 21.0s for 50 pickle files, rss= 3,837MB, vms= 3,905MB
# 25.6s for 50 pickle files, rss= 4,369MB, vms= 4,448MB
# 28.0s for 50 pickle files, rss= 4,898MB, vms= 4,989MB
# 32.3s for 50 pickle files, rss= 5,427MB, vms= 5,530MB
# 36.7s for 50 pickle files, rss= 5,966MB, vms= 6,081MB

Linux

# Versions: platform.system()='Linux', platform.release()='6.0.12-76060012-generic', platform.version()='#202212290932~1674066459~20.04~3cd2bf3-Ubuntu SMP PREEMPT_DYNAMI', sys.version='3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0]', np.__version__='1.23.5'
#  2.8s for 50 pickle files, rss= 1,222MB, vms= 2,229MB
#  2.8s for 50 pickle files, rss= 1,730MB, vms= 2,738MB
#  2.8s for 50 pickle files, rss= 2,238MB, vms= 3,246MB
#  2.8s for 50 pickle files, rss= 2,747MB, vms= 3,754MB
#  2.8s for 50 pickle files, rss= 3,255MB, vms= 4,263MB
#  2.8s for 50 pickle files, rss= 3,764MB, vms= 4,772MB
#  2.8s for 50 pickle files, rss= 4,272MB, vms= 5,280MB
#  2.8s for 50 pickle files, rss= 4,780MB, vms= 5,789MB
#  2.9s for 50 pickle files, rss= 5,288MB, vms= 6,298MB
#  2.9s for 50 pickle files, rss= 5,797MB, vms= 6,806MB

CPython versions tested on:

3.11

Operating systems tested on:

Windows

The text was updated successfully, but these errors were encountered:

zooba · 2024-04-04T15:42:22Z

It looks like it's due to having so many elements on the heap. The top most common functions are all involving heap allocation/free (and possibly heap security/validity checks)

If we clear thumbs each time through the loop, the time is stable and we get a far more likely profile (the ? is going to be all of Python - I don't have symbols for this Anaconda build). It's still memory heavy

I see the same behaviour on 3.12.

msspad · 2024-04-04T16:16:55Z

Having a few million images in memory helps to speed up machine learning for GPU-RAM-poors. What I did to circumvent the issue, is using h5py instead of pickle. I posted the h5py code and output here [1]. h5py is slower than pickle at the beginning but keeps about the same speed. As we typically go higher than 2e6 images, that helps quite a bit.

[1] https://stackoverflow.com/questions/77596388/loading-a-python-pickle-slows-down-in-a-for-loop/78250896#78250896

msspad added the type-bug An unexpected behavior, bug, or error label Apr 4, 2024

mdboom added performance Performance or resource usage OS-windows labels Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pickle.load() significantly slows down with each call on Windows11 but not on Linux #117545

pickle.load() significantly slows down with each call on Windows11 but not on Linux #117545

msspad commented Apr 4, 2024 •

edited by github-actions bot

Loading

zooba commented Apr 4, 2024

msspad commented Apr 4, 2024

pickle.load() significantly slows down with each call on Windows11 but not on Linux #117545

pickle.load() significantly slows down with each call on Windows11 but not on Linux #117545

Comments

msspad commented Apr 4, 2024 • edited by github-actions bot Loading

Bug report

Bug description:

Windows 11

Linux

CPython versions tested on:

Operating systems tested on:

zooba commented Apr 4, 2024

msspad commented Apr 4, 2024

msspad commented Apr 4, 2024 •

edited by github-actions bot

Loading