The code is extremely slow ...
Using numba
helps a lot.
Using math helps a tiny bit.
I ran this variant:
from pathlib import Path
from time import time
from typing import Any
from numba import njit
import matplotlib as mp
import matplotlib.pyplot as plt
import numpy as np
NaN = np.nan
@njit
def trough(
N: np.ndarray[Any, np.dtype[np.float64]]
) -> np.ndarray[Any, np.dtype[np.float64]]:
w, h = np.shape(N)
Nt = np.concatenate((np.ones((w, 1)) * NaN, N, np.ones((w, 1)) * NaN), axis=1)
Nt = np.concatenate(
(np.ones((1, h + 2)) * NaN, Nt, np.ones((1, h + 2)) * NaN), axis=0
)
return Nt
@njit
def topple(
N: np.ndarray[Any, np.dtype[np.float64]]
) -> np.ndarray[Any, np.dtype[np.float64]]:
P = trough(N)
sP = np.shape(P)
while np.nanmax(P) >= 4:
for i, j in np.ndindex(sP):
# if P[i, j] >= 8 and not np.isnan(P[i, j]):
# P[i, j] -= 8
# P[i + 1, j] += 2
# P[i - 1, j] += 2
# P[i, j + 1] += 2
# P[i, j - 1] += 2
if P[i, j] >= 4 and not np.isnan(P[i, j]):
P[i, j] -= 4
P[i + 1, j] += 1
P[i - 1, j] += 1
P[i, j + 1] += 1
P[i, j - 1] += 1
return P[1:-1, 1:-1]
def picard(m: int, n: int) -> np.ndarray[Any, np.dtype[np.float64]]:
P1 = 6 * np.ones((m, n))
P1 = topple(P1)
P2 = 6 * np.ones((m, n))
Pi = P2 - P1
Pi = topple(Pi)
return Pi
def main(m: int = 300, n: int = 300) -> None:
picard(4, 4) # warmup
t0 = time()
Pi = picard(m, n)
print(f"elapsed: {time() - t0:.3f} sec")
plt.figure()
plt.axes(aspect="equal")
plt.axis("off")
plt.pcolormesh(Pi, cmap=mp.colormaps["viridis_r"])
dim = f"{n}x{m}"
file_name = Path("/tmp") / f"Picard_Identity-{dim}.png"
plt.savefig(file_name)
if __name__ == "__main__":
main()
On cPython 3.11.4 my Mac laptop takes ~ 73.4 seconds.
With numba that becomes 220 msec, a more than 300 X speedup.
Initial conditions gives us a uniform level of six.
Uncommenting the "propagate big numbers faster!" clause
shaves off a bit more than 10% from the elapsed time.
Nothing to write home about.
Some example problems put thousands of particles on the origin,
rather than beginning with a uniform level.
Those problems may benefit more from using arithmetic
to propagate lots of particles at once.
Scaling up to 300 x 300, numba completes the task in ~ 16 seconds.
400 x 400 takes 50 seconds, and
500 x 500 takes 122 seconds.
EDIT
Yup, sure enough, if I deposit a quarter million particles
on an otherwise zero surface of 100 x 100, using this code
we see a 4 X speedup thanks to arithmetic.
k = 12
while np.nanmax(P) >= 4:
for i, j in np.ndindex(sP):
if P[i, j] >= 4 * k and not np.isnan(P[i, j]):
P[i, j] -= 4 * k
P[i + 1, j] += k
P[i - 1, j] += k
P[i, j + 1] += k
P[i, j - 1] += k
if P[i, j] >= 4 and not np.isnan(P[i, j]):
P[i, j] -= 4
P[i + 1, j] += 1 ...
Maintaining a
priority queue
of "big" cells ( ≥ 4 ) might help the algorithm
to focus on dispersal in the central cells, at least initially.
The ⊔ trough
function could perhaps be better named.
Putting particles on top, falling into the abyss of NaNs
on the side, seems more like a ⊓ table
function, maybe?
Not sure what the best name would be.