Solution at the bottom.
I am trying to implement a Brownian bridge and want to speed up the path construction via numpy matrix multiplication, once the bridge is initialised. I am drawing data for 3 underlyings and 256 time steps for 2^16 different paths, which is achieved by drawing 2^16 Sobol' numbers in 3*256 = 768 dimensions. I then reshape the drawn 2D array into the 3D array of shape (3, 256, 2^16) and apply the brownian bridge matrix of shape (256, 256) via numpy matrix multiplication. To use the first dimensions for the largest variance contributions in the brownian bridge, the array is reshaped using the order='F' keyword.
I have noticed that the numpy matrix product "@" is almost 100 times slower when it is applied to the 'F' order reshaped array instead of the default 'C' order reshaped array. This kills any performance gain I get from the numpy matrix multiplication.
Does anyone know why this happens? Is there a way around this?
The following snippet reproduces the issue. Running on my local machine, I get around 0.14 seconds for 'C' and 12.9 seconds for 'F'. I am running Python 3.9.13, scipy version 1.10.0, numpy version 1.23.4.
from timeit import default_timer
from scipy.stats.qmc import Sobol
import numpy as np
np_gen = np.random.default_rng(seed=42)
matrix = np_gen.random((256, 256))
sobol_gen = Sobol(3*256, bits=64, seed=42)
sobol_nbrs = sobol_gen.random_base2(16).T
sobol_c_reshape = sobol_nbrs.reshape((3, 256, 2 ** 16), order='C')
sobol_f_reshape = sobol_nbrs.reshape((3, 256, 2 ** 16), order='F')
def perf_measure(input_nbrs):
start = default_timer()
res = matrix @ input_nbrs
end = default_timer()
print(end-start)
perf_measure(sobol_c_reshape)
perf_measure(sobol_f_reshape)
One possible solution, thanks to the user hpaulj, is to copy the reshaped array into 'C' order:
matrix@(sobol_f_reshape).copy(order='C')
This leads to dramatic speedup where the calculation on the 'F' reshaped array is only 2 times slower than the 'C' reshaped array.
I do not understand why this works.
Also, the problem does not exist for numpy.random generated numbers. I can only reproduce it using the scipy.stats.qmc.Sobol sequence. As the user Nick ODell pointed out, the error is not specific to the Sobol' sequence but can be easily replicated via:
sobol_nbrs = np.random.rand(65536, 768).T
matmul
uses a fast BLAS (or related) library where possible. Since that is optimized C, it makes sense that 'C' order is faster. I don't know the details, it appears that it is taking a slower, 'pure' numpy route with the order 'F'.matrix@(sobol_f_reshape).copy(order='C')
corrects that.strides
, 'order' often doesn't make a big, or defjnable, difference in numpy calculation speeds. Butmatmul
is a complex calculation, and can use highly optimized library routines - if the continuity assumptions are met.sobol_nbrs = np.random.rand(65536, 768).T
.