2

I am working on a task that requires that I use an iterator multiple times. For example

   #data
   fruit= ("grape", "banana", "apple")
   #iterator
   myit = iter(fruit)

   #the function I have
   def printIter(its):
     for x in its:
        print(x)

   def printIter2(its):
     for x in its:
        print(x)

I have to call printIter on the iterator twice but it is to perform completely different functions. But an iterator can only be consumed once. I don't have control over the data source fruit and iterator myit. I only have control over the functions printIter().

How best can I achieve my aim using less memory.

What i currently have:

   it1, it2 = itertools.tee(its)
   printIter(it1)
   printIter(it2)
   del it1, it2

Is this a good practice, any other way?

4
  • 1
    From the documentation of itertools.tee: "This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()."
    – Matthias
    Commented Mar 30, 2020 at 17:06
  • 1
    Maybe this is just to set the question up, but why are you calling iter in the first place, instead of using fruit directly? The for loop is implicitly calling iter on its, even if it is already an iterator.
    – chepner
    Commented Mar 30, 2020 at 17:13
  • @chepner, I don't have control over iter but from the structure of the code, that is what is happening. I am to just consume the iterator. Commented Mar 30, 2020 at 17:18
  • As an aside, why del it1, it2? It's pointless. Anyway, tee is good of you use part of the results then need to start consuming again, if you are going to consume the whole thing once then need to do it again, you might as well just use list Commented Mar 30, 2020 at 19:13

2 Answers 2

2

If all you have is an iterator and you need to do two kinds of processing on it without consuming too much memory, your best bet is to design the processing you're doing to work in parallel. That is, you need to be able to do both parts of your processing on one item at a time. In your example, both of your iterator-consuming functions were just printing it out, which doesn't lend it self well to parallelization (you'd get the printout in a different order, e.g. 1, 1, 2, 2, 3, 3, ...). But for other kinds of problems, it's easy to do part of the work and then wait for more data.

Here's an example where I use two generator functions to consume a tee'd iterator in parallel (using the builtin zip). One adds up the values it gets and prints only the final sum, and the other prints them individually.

def consume1(it):
    total = 0
    for value in it:
        total += value
        yield
    print(total)

def consume2(it):
    for value in it:
        print(value)
        yield

opaque_iterator = iter((1, 2, 3, 4))
it1, it2 = itertools.tee(opaque_iterator)

for _ in zip(consume1(it1), consume2(it2)):
    pass

Output:

1
2
3
4
10

There are a bunch of subtleties to this kind of code, so don't be surprised if you don't get it working in your first attempt. My code above is pretty fragile, as zip isn't really designed for managing separate generators like this.

0

Since iterators are stateful and its resources consumed I am not sure what the goal of using the same iterator twice would be.

However, if you dont want to have the memory overhead of two copies of the iterator at the same time as is happening with tee(), you can just redeclare the iterator after the first is consumed and deleted

import itertools

#the function I have
def printIter(its):
  for x in its:
    print(x)

def printIter2(its):
  for x in its:
    print(x)


#data
fruit= ("grape", "banana", "apple")

#iterator
myit = iter(fruit)

#it1, it2 = itertools.tee(myit)
printIter(myit)
del myit
myit = iter(fruit)
printIter2(myit)
del myit

Since you indicated that you have no access to the original data, tee() is probably the best you can do with iterators. However, you can consider converting the single iterator to a list, then do the repetitive operations on that.

import itertools
#data
fruit= ("grape", "banana", "apple")
#iterator
myit = iter(fruit)

def printIter(its):
  for x in its:
    print(x)

mylist = list(myit)
del myit
printIter(mylist)
printIter(mylist)
5
  • 1
    There's no point in calling iter between calls to printIter; you can simply pass fruit directly.
    – chepner
    Commented Mar 30, 2020 at 17:15
  • definitely, and since OP has control over printIter() it is probably best to scrap the idea of iterators
    – pastaleg
    Commented Mar 30, 2020 at 17:18
  • Hi @chepner and @pastaleg, I don't have access to iter and fruit directly due to abstraction, I can only call a function to return iter, then use the iterator. Commented Mar 30, 2020 at 18:04
  • Hey @JesujobaALABI, I updated the answer to suggest a conversion to a list.
    – pastaleg
    Commented Mar 30, 2020 at 18:37
  • @juanpa.arrivillaga heavy handed indication of intent. If I try to use that variable later I will get better error message .
    – pastaleg
    Commented Mar 30, 2020 at 19:22

Not the answer you're looking for? Browse other questions tagged or ask your own question.