Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dramatic slowdown with random.random on free-threading build. #118392

Open
corona10 opened this issue Apr 29, 2024 · 5 comments
Open

Dramatic slowdown with random.random on free-threading build. #118392

corona10 opened this issue Apr 29, 2024 · 5 comments
Labels
performance Performance or resource usage topic-free-threading

Comments

@corona10
Copy link
Member

corona10 commented Apr 29, 2024

out

This slowdown was found with one of my favorite benchmarks, which is calculating the pi value with the Monte Carlo method.

import os
import random
import time
from threading import Thread

def monte_carlo_pi_part(n: int, idx: int, results: list[int]) -> None:
    count = 0
    for i in range(n):
        x = random.random()
        y = random.random()

        if x*x + y*y <= 1:
            count += 1
    results[idx] = count


n = 10000
threads = []
num_threads = 100
results = [0] * num_threads
a = time.time()
for i in range(num_threads):
    t = Thread(target=monte_carlo_pi_part, args=(n, i, results))
    t.start()
    threads.append(t)

while threads:
    t = threads.pop()
    t.join()

b = time.time()
print(sum(results) / (n * num_threads) * 4)
print(b-a)

Acquiring critical sections for random methods causes this slowdown.
Removing @critical_section from the method, which uses genrand_uint32 and then updating genrand_uint32 to use atomic operation makes the performance acceptable.

Build Elapsed PI
Default (with specialization) 0.16528010368347168 3.144508
Free-threading (with no specialization) 0.548654317855835 3.1421
Free-threading with my patch (with no specialization) 0.2606849670410156 3.141108

Linked PRs

@corona10
Copy link
Member Author

@colesbury
Copy link
Contributor

It'd be great to have the default random number generator be safe and efficient when called from multiple threads, but I think that's a bigger change that requires more discussion and maybe a PEP.

In the meantime, I think the advice should be to use individual Random objects per-thread. In other words, the program should be written as:

def monte_carlo_pi_part(n: int, idx: int, results: list[int]) -> None:
    r = random.Random()
    count = 0
    for i in range(n):
        x = r.random()
        y = r.random()

        if x*x + y*y <= 1:
            count += 1
    results[idx] = count

I think that you also would need #118112 (or deferred reference counting) for the program to be efficient, or otherwise the reference count contention is going to be a bottleneck.

@corona10
Copy link
Member Author

In the meantime, I think the advice should be to use individual Random objects per-thread. In other words, the program should be written as:

Yeah, it works. so it should be included somewhere in the migration guide.

@corona10
Copy link
Member Author

@colesbury
Copy link
Contributor

We could also consider adopting a strategy like Go does, at least for the free-threaded build:

  • If seed() has not been called, use a fast per-thread RNG for the global calls
  • Once seed() is called, switch to the shared state RNG to preserve old behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-free-threading
2 participants