How to make use of an Iterator multiple times in python

Question

I am working on a task that requires that I use an iterator multiple times. For example

   #data
   fruit= ("grape", "banana", "apple")
   #iterator
   myit = iter(fruit)

   #the function I have
   def printIter(its):
     for x in its:
        print(x)

   def printIter2(its):
     for x in its:
        print(x)

I have to call printIter on the iterator twice but it is to perform completely different functions. But an iterator can only be consumed once. I don't have control over the data source fruit and iterator myit. I only have control over the functions printIter().

How best can I achieve my aim using less memory.

What i currently have:

   it1, it2 = itertools.tee(its)
   printIter(it1)
   printIter(it2)
   del it1, it2

Is this a good practice, any other way?

From the documentation of itertools.tee: "This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()." — Matthias, Commented Mar 30, 2020 at 17:06
Maybe this is just to set the question up, but why are you calling iter in the first place, instead of using fruit directly? The for loop is implicitly calling iter on its, even if it is already an iterator. — chepner, Commented Mar 30, 2020 at 17:13
@chepner, I don't have control over iter but from the structure of the code, that is what is happening. I am to just consume the iterator. — Jesujoba Oluwadara ALABI, Commented Mar 30, 2020 at 17:18
As an aside, why del it1, it2? It's pointless. Anyway, tee is good of you use part of the results then need to start consuming again, if you are going to consume the whole thing once then need to do it again, you might as well just use list — juanpa.arrivillaga, Commented Mar 30, 2020 at 19:13

Blckknght · Accepted Answer · 2020-03-30 19:21:13Z

If all you have is an iterator and you need to do two kinds of processing on it without consuming too much memory, your best bet is to design the processing you're doing to work in parallel. That is, you need to be able to do both parts of your processing on one item at a time. In your example, both of your iterator-consuming functions were just printing it out, which doesn't lend it self well to parallelization (you'd get the printout in a different order, e.g. 1, 1, 2, 2, 3, 3, ...). But for other kinds of problems, it's easy to do part of the work and then wait for more data.

Here's an example where I use two generator functions to consume a tee'd iterator in parallel (using the builtin zip). One adds up the values it gets and prints only the final sum, and the other prints them individually.

def consume1(it):
    total = 0
    for value in it:
        total += value
        yield
    print(total)

def consume2(it):
    for value in it:
        print(value)
        yield

opaque_iterator = iter((1, 2, 3, 4))
it1, it2 = itertools.tee(opaque_iterator)

for _ in zip(consume1(it1), consume2(it2)):
    pass

Output:

There are a bunch of subtleties to this kind of code, so don't be surprised if you don't get it working in your first attempt. My code above is pretty fragile, as zip isn't really designed for managing separate generators like this.

pastaleg · Accepted Answer · 2020-03-30 18:35:52Z

0

Since iterators are stateful and its resources consumed I am not sure what the goal of using the same iterator twice would be.

However, if you dont want to have the memory overhead of two copies of the iterator at the same time as is happening with tee(), you can just redeclare the iterator after the first is consumed and deleted

import itertools

#the function I have
def printIter(its):
  for x in its:
    print(x)

def printIter2(its):
  for x in its:
    print(x)


#data
fruit= ("grape", "banana", "apple")

#iterator
myit = iter(fruit)

#it1, it2 = itertools.tee(myit)
printIter(myit)
del myit
myit = iter(fruit)
printIter2(myit)
del myit

Since you indicated that you have no access to the original data, tee() is probably the best you can do with iterators. However, you can consider converting the single iterator to a list, then do the repetitive operations on that.

import itertools
#data
fruit= ("grape", "banana", "apple")
#iterator
myit = iter(fruit)

def printIter(its):
  for x in its:
    print(x)

mylist = list(myit)
del myit
printIter(mylist)
printIter(mylist)

edited Mar 30, 2020 at 18:35

answered Mar 30, 2020 at 17:04

pastaleg

1,8182 gold badges18 silver badges25 bronze badges

1

There's no point in calling iter between calls to printIter; you can simply pass fruit directly.
– chepner
Commented Mar 30, 2020 at 17:15
definitely, and since OP has control over printIter() it is probably best to scrap the idea of iterators
– pastaleg
Commented Mar 30, 2020 at 17:18
Hi @chepner and @pastaleg, I don't have access to iter and fruit directly due to abstraction, I can only call a function to return iter, then use the iterator.
– Jesujoba Oluwadara ALABI
Commented Mar 30, 2020 at 18:04
Hey @JesujobaALABI, I updated the answer to suggest a conversion to a list.
– pastaleg
Commented Mar 30, 2020 at 18:37
@juanpa.arrivillaga heavy handed indication of intent. If I try to use that variable later I will get better error message .
– pastaleg
Commented Mar 30, 2020 at 19:22

Add a comment |

Collectives™ on Stack Overflow

How to make use of an Iterator multiple times in python

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
python-3.x
loops
iterator
iterable
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythonpython-3.xloopsiteratoriterable or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
python-3.x
loops
iterator
iterable
or ask your own question.