How does cuRAND use a GPU to accelerate random number generation? Don't those require a state?

Question

My understanding is that every PRNG or QRNG requires a state to prevent the next item in its sequence from being too predictable; which is sensible, as they're all running on deterministic hardware.

GPUs are, by design, non-Von Neumann architectures which are incapable of remembering past operations, predicting future ones, or maintaining any knowledge of their own application on other parameters. Effectively they can't maintain a state between one shader call and another.

Yet, cuRAND (from Nvidia's CUDA) purports that it generates random numbers exceedingly quickly using "hundreds of GPU processor cores" and achieves results "8x faster" (documentation's words). Having produced random numbers with LCGs in shaders before (just as an experiment) and knowing how difficult it can be without maintaining a state between calls, this is very difficult for me to believe.

So, how is it that CUDA can accelerate a stateful system? Is it actually maintaining a state somehow? Is there limited CPU involvement, or some creative use of GDDR? I've got a lot of ideas, but GPUs generally run dozens of times slower than CPUs on a single core, and Nvidia seems to be playing this very "close to its chest". The documentation claims just aren't enough for me. There must be something I don't know.

Krupip · Accepted Answer · 2021-06-18 11:53:06Z

First, cuRAND does have state, in fact, there's so much state that the initialization of cuRAND state can often overtake runtime performance of your actual application depending on what you're doing if you run it everytime.

But it is actually possible to do pseudo stateless PRNG (pseudo random number generation) as well on the GPU (though I don't believe Nvidia provides this). One way to do this is with avalanche mixers, basically just functions that mix numbers up a lot. Using these functions, how close a number is to another can be made to be uncorrelated to the output of these avalanche mixers.

This property, of adjacent numbers being uncorrelated, is important, because instead of using some seeded state you can use the index of the given thread as the source of the PRNG. Now this will only get you a single random number, which may be enough if you're, say using a separate PRNG that needs to be independently seeded per thread on top of that, and a Linear Congruential Generator takes such little state that this can be stored per thread and never touch global memory. But this still isn't stateless in the sense that state still needs to be kept per thread from previous iterations (for LCGs, it's the last value).

The next thing you can do with your avalanche mixer is mix an iterative index/time into your value. So say you had an avalanche mixer function, am(), which takes as arguments am(x,y,t). X and Y are the indexes of your threads. T is time, or the iteration count of your work. You can then use am(x,y,t) to generate a PRNG with out state. This also has the property of arbitrarily being able to go back "in time" to any previous state and generate PRNG values exactly from x,y, and t. This property makes avalanche mixers excellent "hash" functions for noise functions like Perlin Noise or Simplex Noise, and other coherent noise generators.

The downside of these kinds of PRNG functions is they often are not cryptographically secure, though in virtually any context that does not involve cryptography (physics, normal RNG) this is unimportant.

Thank you. I've been doing this for long enough to know when there must be a catch. Your additional notes on avalanche mixers are also useful, and probably warrant a little study from me. — Michael Macha, Commented Jun 18, 2021 at 1:57
Could you point me to an avalanche mixer algorithm that doesn't use a seed? Looking here at mumur3 it seems to use a seed. — JimmyJames, Commented Jun 18, 2021 at 15:30
@JimmyJames Mumur hash 3 is not an avalanche mixer, mumurhash3 uses an avalanche mixer internally, see my earlier post stackoverflow.com/questions/45218741/… for examples of murmur hash avalanche mixers that you can use — Krupip, Commented Jun 18, 2021 at 15:35
Oh thanks, that was actually the question that led me to the wiki page but I didn't notice you were the author or look closely at your question. — JimmyJames, Commented Jun 18, 2021 at 15:47
I think your last sentence might be a little strongly worded. There are many scenarios that aren't really considered cryptography where a secure PNRG is needed. A slot machine, or video poker machine comes to mind. v4 UUIDs are another scenario where this can be important. — JimmyJames, Commented Jun 18, 2021 at 15:55

LSerni · Accepted Answer · 2021-06-17 18:35:43Z

3

I'm possibly missing a lot here, but cuRAND does have a state somehow.

From the documentation of cuRAND, 3.1.3 (emphasis mine):

_device__ void curand_init (
unsigned long long seed, unsigned long long subsequence,
unsigned long long offset, curandState_t *state)
The curand_init() function sets up an initial state allocated by the caller using the given seed, subsequence and offset. Different seed is guaranteed to produce different starting states and different sequences.

answered Jun 17, 2021 at 18:35

LSerni

2,44116 silver badges21 bronze badges

The term "device memory" also appears on that page 11 times and the introduction seems to indicate 'device' refers to the GPU: "a library on the host (CPU) side and a device (GPU) header file"
– JimmyJames
Commented Jun 17, 2021 at 20:01

Add a comment |

JimmyJames · Accepted Answer · 2021-06-17 19:56:18Z

My understanding is that every PRNG or QRNG requires a state to prevent the next item in its sequence from being too predictable; which is sensible, as they're all running on deterministic hardware.

I'm not sure this is an answer to your specific question about state but the way you are stating this isn't quite right. PNRGs are essentially an algorithm that works like a chaotic function which meets specific criteria. The prior state is what determines the next output. The prior state is what generates the next state. It's not that the state is necessary to prevent them from being predictable, it's that there's no other way they can work: they are purely deterministic based on their initial state. The lack of predictability is based on the quality of the algorithm i.e. the prior output cannot be used to guess at or determine the initial or current state.

It might help to look at the algorithm for the mersenne twister to get a better sense of this.

Stack Exchange Network

How does cuRAND use a GPU to accelerate random number generation? Don't those require a state?

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
architecture
hardware
state
random
gpu
or ask your own question.

Hot Network Questions

How does cuRAND use a GPU to accelerate random number generation? Don't those require a state?

3 Answers 3

Not the answer you're looking for? Browse other questions tagged architecturehardwarestaterandomgpu or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
architecture
hardware
state
random
gpu
or ask your own question.