itertools product speed up

Question

I use itertools.product to generate all possible variations of 4 elements of length 13. The 4 and 13 can be arbitrary, but as it is, I get 4^13 results, which is a lot. I need the result as a Numpy array and currently do the following:

  c = it.product([1,-1,np.complex(0,1), np.complex(0,-1)], repeat=length)
  sendbuf = np.array(list(c))

With some simple profiling code shoved in between, it looks like the first line is pretty much instantaneous, whereas the conversion to a list and then Numpy array takes about 3 hours. Is there a way to make this quicker? It's probably something really obvious that I am overlooking.

Thanks!

You probably would gain a lot by implementing product for Numpy arrays... isn't there such a thing? Edit: see stackoverflow.com/questions/1208118 for a start. — TryPyPy, Commented Jan 17, 2011 at 2:29
How about using fromiter()? sendbuf = np.fromiter(c, np.complex) — Jeff Mercado, Commented Jan 17, 2011 at 2:32
the first line is instantaneous because it's just creating an iterator and doesn't actually generate the sequences. The sequences are created in the second line and generating 4^13 element will be slow. — kefeizhou, Commented Jan 17, 2011 at 2:36
What is the actual problem you're trying to solve? There's most likely solutions that won't require you to generating all 4^13 sequences. — kefeizhou, Commented Jan 17, 2011 at 2:43
Do you require a 13 by 4^13 2D array, or a 1D array of python tuple objects? — Paul, Commented Jan 17, 2011 at 5:35

Sven Marnach · Accepted Answer · 2011-01-17 17:04:26Z

21

The NumPy equivalent of itertools.product() is numpy.indices(), but it will only get you the product of ranges of the form 0,...,k-1:

numpy.rollaxis(numpy.indices((2, 3, 3)), 0, 4)
array([[[[0, 0, 0],
         [0, 0, 1],
         [0, 0, 2]],

        [[0, 1, 0],
         [0, 1, 1],
         [0, 1, 2]],

        [[0, 2, 0],
         [0, 2, 1],
         [0, 2, 2]]],


       [[[1, 0, 0],
         [1, 0, 1],
         [1, 0, 2]],

        [[1, 1, 0],
         [1, 1, 1],
         [1, 1, 2]],

        [[1, 2, 0],
         [1, 2, 1],
         [1, 2, 2]]]])

For your special case, you can use

a = numpy.indices((4,)*13)
b = 1j ** numpy.rollaxis(a, 0, 14)

(This won't run on a 32 bit system, because the array is to large. Extrapolating from the size I can test, it should run in less than a minute though.)

EIDT: Just to mention it: the call to numpy.rollaxis() is more or less cosmetical, to get the same output as itertools.product(). If you don't care about the order of the indices, you can just omit it (but it is cheap anyway as long as you don't have any follow-up operations that would transform your array into a contiguous array.)

EDIT2: To get the exact analogue of

numpy.array(list(itertools.product(some_list, repeat=some_length)))

you can use

numpy.array(some_list)[numpy.rollaxis(
    numpy.indices((len(some_list),) * some_length), 0, some_length + 1)
    .reshape(-1, some_length)]

This got completely unreadable -- just tell me whether I should explain it any further :)

edited Jan 17, 2011 at 17:04

answered Jan 17, 2011 at 15:29

Sven Marnach

593k122 gold badges960 silver badges858 bronze badges

Hmm. I'd have to reshape though. But that's probably still faster. Thanks! Let me play with that a bit.
– Christoph
Commented Jan 17, 2011 at 15:50
2

@Christoph: Reshaping is cheap, as it doesn't change the data at all.
– Sven Marnach
Commented Jan 17, 2011 at 15:52
1

@Christoph: No, the call is not strictly necessary, it only brings the array in exactly the same shape as your original version. I don't know what you are going to do with the permutations, so I can't give advice how to do it. But I personally wouldn't bother using rollaxis() or reshape() at all, but rather use the indices() output as is. For example, you can iterate over all permutations using for perm in izip(*(a.ravel() for a in numpy.indices(...))). Most probably you can also make do without reshaping for your application.
– Sven Marnach
Commented Jan 17, 2011 at 16:42
1

@Christoph - They are exactly the same results, even in the same order, for what it's worth.
– Joe Kington
Commented Jan 17, 2011 at 16:43
1

@Christoph: If a = numpy.indices((4,)*4), then a[0] will contain all the values for the first "index", a[1] will be the values for the second "index", and so on. For every n in the range 0 to 4^4-1, a[0].flat[n], a[1].flat[n], a[2].flat[n], a[3].flat[n] will be a valid permutation, with no duplicates. If you get confused by the ordering of the results, just use rollaxis(), but you don't have to.
– Sven Marnach
Commented Jan 17, 2011 at 17:01

| Show 8 more comments

Community · Accepted Answer · 2017-05-23 12:00:07Z

6

The first line seems instantaneous because no actual operation is taking place. A generator object is just constructed and only when you iterate through it as the operating taking place. As you said, you get 4^13 = 67108864 numbers, all these are computed and made available during your list call. I see that np.array takes only list or a tuple, so you could try creating a tuple out of your iterator and pass it to np.array to see if there is any performance difference and it does not affect the overall performance of your program. This can be determined only by trying for your usecase though there are some points which say tuple is slightly faster.

To try with a tuple, instead of list just do

sendbuf = np.array(tuple(c))

edited May 23, 2017 at 12:00

CommunityBot

11 silver badge

answered Jan 17, 2011 at 2:37

Senthil Kumaran

56k15 gold badges96 silver badges135 bronze badges

Thanks Senthil. If nothing else works, I will try with tuples.
– Christoph
Commented Jan 17, 2011 at 16:35

Add a comment |

Eric O. Lebigot · Accepted Answer · 2011-01-26 22:06:24Z

6

You could speed things up by skipping the conversion to a list:

numpy.fromiter(c, count=…)  # Using count also speeds things up, but it's optional

With this function, the NumPy array is first allocated and then initialized element by element, without having to go through the additional step of a list construction.

PS: fromiter() does not handle the tuples returned by product(), so this might not be a solution, for now. If fromiter() did handle dtype=object, this should work, though.

PPS: As Joe Kington pointed out, this can be made to work by putting the tuples in a structured array. However, this does not appear to always give a speed up.

edited Jan 26, 2011 at 22:06

answered Jan 17, 2011 at 13:36

Eric O. Lebigot

93.6k49 gold badges220 silver badges263 bronze badges

Haven't tested it yet, but that was precisely what I was envisioning, thanks!
– Christoph
Commented Jan 17, 2011 at 14:41
1

@Christoph: I think that NumPy complains because fromiter() tries to create a 1D array of tuples (this function only creates 1D arrays). So, a good choice of type would seem to be dtype=object… but NumPy refuses to do this. I'm not sure how to solve this problem (other than lobbying for having the possibility of using dtype=object :).
– Eric O. Lebigot
Commented Jan 17, 2011 at 15:19
& @Christoph - You can work around this by using a structured array and then viewing it as a "regular" array. Here's how: pastebin.com/tBLkzL64 Unfortunately, it's not any faster than just using a list or a tuple to do it, as you originally were.
– Joe Kington
Commented Jan 17, 2011 at 15:59

Add a comment |

Alleo · Accepted Answer · 2015-12-05 00:00:20Z

3

Let numpy.meshgrid do all the job:

length = 13
x = [1, -1, 1j, -1j]
mesh = numpy.meshgrid(*([x] * length))
result = numpy.vstack([y.flat for y in mesh]).T

on my notebook it takes ~2 minutes

answered Dec 5, 2015 at 0:00

Alleo

8,2982 gold badges43 silver badges32 bronze badges

Add a comment |

Eric O. Lebigot · Accepted Answer · 2011-01-17 15:43:10Z

2

You might want to try a completely different approach: first create an empty array of the desired size:

result = np.empty((4**length, length), dtype=complex)

then use NumPy's slicing abilities to fill out the array yourself:

# Set up of the last "digit":
result[::4, length-1] = 1
result[1::4, length-1] = -1
result[2::4, length-1] = 1j
result[3::4, length-1] = -1j

You can do similar things for the other "digits" (i.e. the elements of result[:, 2], result[:, 1], and result[:, 0]). The whole thing could certainly be put in a loop that iterates over each digit.

Transposing the whole operation (np.empty((length, 4**length)…)) is worth trying, as it might bring a speed gain (through a better use of the memory cache).

answered Jan 17, 2011 at 15:43

Eric O. Lebigot

93.6k49 gold badges220 silver badges263 bronze badges

This doesn't appear to give the correct answer (even when you use a loop to do it for each "digit")... If it did, it would be the fastest version so far... Did I understand what you meant correctly? Here's how I implemented this version: pastebin.com/VpwXVkQs
– Joe Kington
Commented Jan 17, 2011 at 16:36
@Joe: I meant something different from your implementation, namely that the before-last digits result[:, length-1] should be set to [1, 1, 1, 1, -1, -1, -1, -1, 1j, 1j, etc.]. And similarly for the subsequent digits (with repeated sequences of increasing size). Is this clearer? :)
– Eric O. Lebigot
Commented Jan 17, 2011 at 17:25
@Joe: What I was suggesting is essentially what numpy.indices() does; the difference is that I was setting coordinates manually, but directly to the values requested in the question.
– Eric O. Lebigot
Commented Jan 21, 2011 at 9:48

Add a comment |

Paul · Accepted Answer · 2011-01-17 06:53:40Z

1

Probably not optimized but much less reliant on python type conversions:

ints = [1,2,3,4]
repeat = 3

def prod(ints, repeat):
    w = repeat
    l = len(ints)
    h = l**repeat
    ints = np.array(ints)
    A = np.empty((h,w), dtype=int)
    rng = np.arange(h)
    for i in range(w):
        x = l**i
        idx = np.mod(rng,l*x)/x
        A[:,i] = ints[idx]
    return A

answered Jan 17, 2011 at 6:53

Paul

43.2k16 gold badges110 silver badges125 bronze badges

Add a comment |

Collectives™ on Stack Overflow

itertools product speed up

6 Answers 6

Let numpy.meshgrid do all the job:

Not the answer you're looking for? Browse other questions tagged
python
numpy
python-itertools
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Let numpy.meshgrid do all the job:

Not the answer you're looking for? Browse other questions tagged pythonnumpypython-itertools or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
numpy
python-itertools
or ask your own question.