4

I would like to compare all elements in my iterable object combinatorically with each other. The following reproducible example just mimics the functionality of a plain list, but demonstrates my problem. In this example with a list of ["A","B","C","D"], I would like to get the following 16 lines of output, every combination of each item with each other. A list of 100 items should generate 100*100=10,000 lines.

A A True
A B False
A C False
... 10 more lines ...
D B False
D C False
D D True

The following code seemed like it should do the job.

class C():
    def __init__(self):
        self.stuff = ["A","B","C","D"]
    def __iter__(self):
        self.idx = 0
        return self
    def __next__(self):
        self.idx += 1
        if self.idx > len(self.stuff):
            raise StopIteration
        else:
            return self.stuff[self.idx - 1]

thing = C()
for x in thing:
    for y in thing:
        print(x, y, x==y)

But after finishing the y-loop, the x-loop seems done, too, even though it's only used the first item in the iterable.

A A True
A B False
A C False
A D False

After much searching, I eventually tried the following code, hoping that itertools.tee would allow me two independent iterators over the same data:

import itertools
thing = C()
thing_one, thing_two = itertools.tee(thing)
for x in thing_one:
    for y in thing_two:
        print(x, y, x==y)

But I got the same output as before.

The real-world object this represents is a model of a directory and file structure with varying numbers of files and subdirectories, at varying depths into the tree. It has nested links to thousands of members and iterates correctly over them once, just like this example. But it also does expensive processing within its many internal objects on-the-fly as needed for comparisons, which would end up doubling the workload if I had to make a complete copy of it prior to iterating. I would really like to use multiple iterators, pointing into a single object with all the data, if possible.


Edit on answers: The critical flaw in the question code, pointed out in all answers, is the single internal self.idx variable being unable to handle multiple callers independently. The accepted answer is the best for my real class (oversimplified in this reproducible example), another answer presents a simple, elegant solution for simpler data structures like the list presented here.

6
  • Is your object indexable? Does it have a __len__ method? Commented Oct 25, 2017 at 20:49
  • It's essentially a representation of a nested directory and file structure, so with the multiple levels, I don't have a single index into it all. I do have an overall count of nodes, however, so I could write a len easily.
    – mightypile
    Commented Oct 25, 2017 at 20:51
  • How deeply is it nested? Are there always only two levels? Commented Oct 25, 2017 at 20:52
  • Are the number of children always consistent to each parent or does it vary? or can that number be found dynamically? Commented Oct 25, 2017 at 20:54
  • To sum up my previous questions, is there any way to use nested for loops over ranges and use the resulting numbers as an index? Something like for x in range(len(obj)): for y in range(len(obj)): print(obj[x] == obj[y]) Commented Oct 25, 2017 at 21:00

2 Answers 2

4

It's actually impossible to make a container class that is it's own iterator. The container shouldn't know about the state of the iterator and the iterator doesn't need to know the contents of the container, it just needs to know which object is the corresponding container and "where" it is. If you mix iterator and container different iterators will share state with each other (in your case the self.idx) which will not give the correct results (they read and modify the same variable).

That's the reason why all built-in types have a seperate iterator class (and even some have an reverse-iterator class):

>>> l = [1, 2, 3]
>>> iter(l)
<list_iterator at 0x15e360c86d8>
>>> reversed(l)
<list_reverseiterator at 0x15e360a5940>

>>> t = (1, 2, 3)
>>> iter(t)
<tuple_iterator at 0x15e363fb320>

>>> s = '123'
>>> iter(s)
<str_iterator at 0x15e363fb438>

So, basically you could just return iter(self.stuff) in __iter__ and drop the __next__ altogether because list_iterator knows how to iterate over the list:

class C:
    def __init__(self):
        self.stuff = ["A","B","C","D"]
    def __iter__(self):
        return iter(self.stuff)

thing = C()
for x in thing:
    for y in thing:
        print(x, y, x==y)

prints 16 lines, like expected.

If your goal is to make your own iterator class, you need two classes (or 3 if you want to implement the reversed-iterator yourself).

class C:
    def __init__(self):
        self.stuff = ["A","B","C","D"]
    def __iter__(self):
        return C_iterator(self)
    def __reversed__(self):
        return C_reversed_iterator(self)

class C_iterator:
    def __init__(self, parent):
        self.idx = 0
        self.parent = parent
    def __iter__(self):
        return self
    def __next__(self):
        self.idx += 1
        if self.idx > len(self.parent.stuff):
            raise StopIteration
        else:
            return self.parent.stuff[self.idx - 1]

thing = C()
for x in thing:
    for y in thing:
        print(x, y, x==y)

works as well.

For completeness, here's one possible implementation of the reversed-iterator:

class C_reversed_iterator:
    def __init__(self, parent):
        self.parent = parent
        self.idx = len(parent.stuff) + 1
    def __iter__(self):
        return self
    def __next__(self):
        self.idx -= 1
        if self.idx <= 0:
            raise StopIteration
        else:
            return self.parent.stuff[self.idx - 1]

thing = C()
for x in reversed(thing):
    for y in reversed(thing):
        print(x, y, x==y)

Instead of defining your own iterators you could use generators. One way was already shown in the other answer:

class C:
    def __init__(self):
        self.stuff = ["A","B","C","D"]
    def __iter__(self):
        yield from self.stuff
    def __reversed__(self):
        yield from self.stuff[::-1]

or explicitly delegate to a generator function (that's actually equivalent to the above but maybe more clear that it's a new object that is produced):

def C_iterator(obj):
    for item in obj.stuff:
        yield item

def C_reverse_iterator(obj):
    for item in obj.stuff[::-1]:
        yield item

class C:
    def __init__(self):
        self.stuff = ["A","B","C","D"]
    def __iter__(self):
        return C_iterator(self)
    def __reversed__(self):
        return C_reverse_iterator(self)

Note: You don't have to implement the __reversed__ iterator. That was just meant as additional "feature" of the answer.

2
  • I'm almost certain I'll need to implement my own iterator, as my real-world class needs to iterate over multiple lists one after the other, opaque to the caller. I'm still learning, but I'll be surprised if a generator like yield will handle my case. This answer covers a lot of ground in getting me where I need to be. Cheers!
    – mightypile
    Commented Oct 26, 2017 at 13:02
  • 2
    @mightypile: Iterate over multiple lists one after the other you say? Hmm... Commented Oct 28, 2017 at 1:10
1

Your __iter__ is completely broken. Instead of actually making a fresh iterator on every call, it just resets some state on self and returns self. That means you can't actually have more than one iterator at a time over your object, and any call to __iter__ while another loop over the object is active will interfere with the existing loop.

You need to actually make a new object. The simplest way to do that is to use yield syntax to write a generator function. The generator function will automatically return a new iterator object every time:

class C(object):
    def __init__(self):
        self.stuff = ['A', 'B', 'C', 'D']
    def __iter__(self):
        for thing in self.stuff:
            yield thing
2
  • 1
    General rule: If you define __next__, then __iter__ must be the identity function (doing nothing but return self). If it isn't, the code is wrong. And usually, you don't really want to hand-implement an iterator class, so you'd just make __iter__ a generator function as demonstrated here and avoid implementing __next__ entirely. The generator function approach is faster (letting Python manage generator state lets it do it much more efficiently) and simpler than defining a separate iterator class for your type. Commented Oct 25, 2017 at 21:17
  • The yield direction is helpful, as is the general rule. I need to figure out if they'll work for my use-case of iterating through multiple levels of a hierarchy within the object.Thanks!
    – mightypile
    Commented Oct 25, 2017 at 21:21

Not the answer you're looking for? Browse other questions tagged or ask your own question.