Multiple Dimension Numpy Arrays
Adding to Owen's and Prashant Kumar's answers, here is a version using multiple dimensional numpy arrays (aka. shape) that speeds up code for the numpy solutions. This is especially helpful if you need to access (finalize()
) the data often.
Version |
Prashant Kumar |
row_length=1 |
row_length=5 |
Class A - np.append |
2.873 s |
2.776 s |
0.682 s |
Class B - python list |
6.693 s |
80.868 s |
22.012 s |
Class C - arraylist |
0.095 s |
0.180 s |
0.043 s |
The column Prashant Kumar
is his example executed on my machine to give a comparison. With row_length=5
it is the example of the initial question. The dramatic increase in the python list
, comes from {built-in method numpy.array}
, which means numpy needs a lot more time to convert a multiple dimensional list of lists to an array in respect to a 1D list and reshape it where both have the same number entries, e.g. np.array([[1,2,3]*5])
vs. np.array([1]*15).reshape((-1,3))
.
And this is the code:
import cProfile
import numpy as np
class A:
def __init__(self,shape=(0,), dtype=float):
"""First item of shape is ingnored, the rest defines the shape"""
self.data = np.array([], dtype=dtype).reshape((0,*shape[1:]))
def update(self, row):
self.data = np.append(self.data, row)
def finalize(self):
return self.data
class B:
def __init__(self, shape=(0,), dtype=float):
"""First item of shape is ingnored, the rest defines the shape"""
self.shape = shape
self.dtype = dtype
self.data = []
def update(self, row):
self.data.append(row)
def finalize(self):
return np.array(self.data, dtype=self.dtype).reshape((-1, *self.shape[1:]))
class C:
def __init__(self, shape=(0,), dtype=float):
"""First item of shape is ingnored, the rest defines the shape"""
self.shape = shape
self.data = np.zeros((100,*shape[1:]),dtype=dtype)
self.capacity = 100
self.size = 0
def update(self, x):
if self.size == self.capacity:
self.capacity *= 4
newdata = np.zeros((self.capacity,*self.data.shape[1:]))
newdata[:self.size] = self.data
self.data = newdata
self.data[self.size] = x
self.size += 1
def finalize(self):
return self.data[:self.size]
def test_class(f):
row_length = 5
x = f(shape=(0,row_length))
for i in range(int(100000/row_length)):
x.update([i]*row_length)
for i in range(1000):
x.finalize()
for x in 'ABC':
cProfile.run('test_class(%s)' % x)
And another option to add to the post above from Luca Fiaschi.
b=[]
for i in range(nruns):
s=time.time()
c1=np.array(a, dtype=int).reshape((N,1000))
b.append((time.time()-s))
print("Timing version array.reshape ",np.mean(b))
The timing result for me is:
Timing version vstack 0.6863266944885253
Timing version reshape 0.505419111251831
Timing version array.reshape 0.5052066326141358
Timing version concatenate 0.5339600563049316