Separate iterators in Python without iterating through them

Question

I have a function that does something like this:

def function_a(x_iter: Iterator[dict]):
    y = {}
    for x in x_iter:
        x = other_func_1(x)
        y = other_func_2(x)
        yield x, y

Downstream in the process, I want to use x and y separately, e.g. I want to pass x as an iterator to another function and I want to save y to JSON file. I know we can't call it like this

x, y = function_a(x_iter)

because x and y will be in the same iterator. How should I separate them? I don't think I can do

result = function_a(x_iter)
for x, y in result:
    <do something with x>
    <do something with y>

since x needs to be passed to another function downstream as an iterator.

Thank you

So, I'm confused. Have you tried your code or not? If you tried some code, then please update your question with this code and the results and a comment about whether it works for you. — quamrana, Commented Aug 18, 2022 at 11:17
So, I've tried your code (suitably modified) and it seems to work fine. However, without some concrete code from you its impossible to tell exactly what is not working. — quamrana, Commented Aug 18, 2022 at 11:23
@quamrana: I don't know what you tried, but x, y = function_a(x_iter) definitely doesn't work. As for the for loop, it's impossible to write the code like that because the iterators need to be processed by downstream functions that take iterators; the questioner cannot write an element-by-element loop. — user2357112, Commented Aug 18, 2022 at 11:27
itertools.tee only pays off if the tee iterators stay positioned close to each other in the data stream. It doesn't help in use cases where one iterator will be fully consumed before the other. — user2357112, Commented Aug 18, 2022 at 12:11

user2357112 · Accepted Answer · 2022-08-18 12:31:20Z

If you can't rewrite the consumer functions, you're going to have to do at least 1 of the following 3 things:

Run the downstream functions simultaneously in 2 separate threads.
Fully materialize at least one of the data streams involved - maybe save the y elements to a list while you iterate over the x elements, or vice versa, or materialize the underling x_iter to a list so you can make two passes to generate the x and y elements.
Generate the input iterator twice.

(If you're really going to save the y elements to a JSON file, you're probably materializing that data to a list anyway, unless you're using a streaming JSON serializer, but it sounds like the JSON thing is just an example.)

Say you iterate over all the x elements somehow. The whole time you're doing that, function_a is producing y elements. You could try to use those elements as function_a produces them, but if you want to do that while the downstream function that consumes the x elements is running, you'll have to run the y consumer at the same time, and the only way to do that without rewriting the consumer functions is to run them in two separate threads. That's the threads option.

If you don't use the y elements immediately, you can store them, but if you're not using separate threads, the y consumer will have to wait until the x consumer finishes. That means you'll have to store the entire y data stream, probably to a list.

If you don't use the y elements immediately, and you don't store them, then they're gone. You can't pull them out of nothing when you're done with the x elements. You have to generate them again, which means you'll need the elements of x_iter, but those elements are gone too. You'll need to either recreate x_iter, or store its contents up front (probably to a list). If you go that way, you probably wouldn't have function_a generate both x and y elements - you'd probably write one function that generates the x elements and one that generates the y elements, so you don't waste time doing work you don't need.

Note that itertools.tee doesn't let you get out of this. It has to store elements in memory too. itertools.tee only pays off in cases where the tee iterators will stay positioned close to each other in the data stream. It's worse than list if you're going to iterate over one iterator fully before starting on the other.

Although it is impossible to tell without knowing what functions OP intends to use the iterators in, I think you could add generator coroutines as another way to solve this. i.e. OP's function_a sends (gen.send) x, and y values into corresponding functions. — Sayandip Dutta, Commented Aug 18, 2022 at 12:24
@SayandipDutta: If you mean rewriting the consumer functions as generator-based coroutines, I don't think the questioner can rewrite the consumers. — user2357112, Commented Aug 18, 2022 at 12:27

matszwecja · Accepted Answer · 2022-08-18 11:34:45Z

1

You cannot, because your iterator inherently produces a tuple of 2 values. What you can do, is to make a wrapper that ignores one of them and yields only 1st value.

other_func_1 = lambda x: x*2
other_func_2 = lambda x: x*3

def function_a(x_iter):
    for x in x_iter:
        x1 = other_func_1(x)
        x2 = other_func_2(x)
        yield x1, x2

def take_ith(x_iter, i):
    for x in x_iter:
        yield x[i]
        
print(list(function_a(range(10))))
print(list(take_ith(function_a(range(10)), 0)))

If you need to generate values of x and y separately, it probably means they shouldn't be grouped in the generator in the first place.

edited Aug 18, 2022 at 11:34

answered Aug 18, 2022 at 11:22

matszwecja

7,6342 gold badges12 silver badges19 bronze badges

Do you have an idea not to group them? I was thinking to run x = other_func_1(x) and y = other_func_2(x) on separate functions, but that means that I have to iterate over x_iter twice and that is ineffective.
– eng2019
Commented Aug 18, 2022 at 11:37
@eng2019 It's hard to tell the best solution without seeing the whole code, but can't you save y further in the code, when the values of x are actually used?
– matszwecja
Commented Aug 18, 2022 at 11:44

Add a comment |

Collectives™ on Stack Overflow

Separate iterators in Python without iterating through them

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
iterator
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythoniterator or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
iterator
or ask your own question.