0

Solution at the bottom.

I am trying to implement a Brownian bridge and want to speed up the path construction via numpy matrix multiplication, once the bridge is initialised. I am drawing data for 3 underlyings and 256 time steps for 2^16 different paths, which is achieved by drawing 2^16 Sobol' numbers in 3*256 = 768 dimensions. I then reshape the drawn 2D array into the 3D array of shape (3, 256, 2^16) and apply the brownian bridge matrix of shape (256, 256) via numpy matrix multiplication. To use the first dimensions for the largest variance contributions in the brownian bridge, the array is reshaped using the order='F' keyword.

I have noticed that the numpy matrix product "@" is almost 100 times slower when it is applied to the 'F' order reshaped array instead of the default 'C' order reshaped array. This kills any performance gain I get from the numpy matrix multiplication.

Does anyone know why this happens? Is there a way around this?

The following snippet reproduces the issue. Running on my local machine, I get around 0.14 seconds for 'C' and 12.9 seconds for 'F'. I am running Python 3.9.13, scipy version 1.10.0, numpy version 1.23.4.

from timeit import default_timer
from scipy.stats.qmc import Sobol
import numpy as np


np_gen = np.random.default_rng(seed=42)
matrix = np_gen.random((256, 256))
sobol_gen = Sobol(3*256, bits=64, seed=42)
sobol_nbrs = sobol_gen.random_base2(16).T

sobol_c_reshape = sobol_nbrs.reshape((3, 256, 2 ** 16), order='C')
sobol_f_reshape = sobol_nbrs.reshape((3, 256, 2 ** 16), order='F')

def perf_measure(input_nbrs):
    start = default_timer()
    res = matrix @ input_nbrs
    end = default_timer()
    print(end-start)

perf_measure(sobol_c_reshape)
perf_measure(sobol_f_reshape)

One possible solution, thanks to the user hpaulj, is to copy the reshaped array into 'C' order:

matrix@(sobol_f_reshape).copy(order='C')

This leads to dramatic speedup where the calculation on the 'F' reshaped array is only 2 times slower than the 'C' reshaped array. I do not understand why this works. Also, the problem does not exist for numpy.random generated numbers. I can only reproduce it using the scipy.stats.qmc.Sobol sequence. As the user Nick ODell pointed out, the error is not specific to the Sobol' sequence but can be easily replicated via:

sobol_nbrs = np.random.rand(65536, 768).T
5
  • matmul uses a fast BLAS (or related) library where possible. Since that is optimized C, it makes sense that 'C' order is faster. I don't know the details, it appears that it is taking a slower, 'pure' numpy route with the order 'F'. matrix@(sobol_f_reshape).copy(order='C') corrects that.
    – hpaulj
    Commented May 13 at 16:29
  • Due to the use of strides, 'order' often doesn't make a big, or defjnable, difference in numpy calculation speeds. But matmul is a complex calculation, and can use highly optimized library routines - if the continuity assumptions are met.
    – hpaulj
    Commented May 14 at 0:30
  • A similar discussion from 2020, stackoverflow.com/questions/60290733/…
    – hpaulj
    Commented May 14 at 0:43
  • Thanks, copying the reshaped array to C order as you suggested in your first comment increases the performance substantially. Interestingly, there is no performance penalty when using a numpy random array instead of the scipy.stats.qmc.Sobol. Any idea why this problem does not happen for numpy.random?
    – Severin
    Commented May 14 at 8:23
  • "I can only reproduce it using the scipy.stats.qmc.Sobol sequence." I can reproduce it without Sobol with sobol_nbrs = np.random.rand(65536, 768).T.
    – Nick ODell
    Commented May 14 at 16:46

0