If you can't rewrite the consumer functions, you're going to have to do at least 1 of the following 3 things:
- Run the downstream functions simultaneously in 2 separate threads.
- Fully materialize at least one of the data streams involved - maybe save the
y
elements to a list while you iterate over the x
elements, or vice versa, or materialize the underling x_iter
to a list so you can make two passes to generate the x
and y
elements.
- Generate the input iterator twice.
(If you're really going to save the y
elements to a JSON file, you're probably materializing that data to a list anyway, unless you're using a streaming JSON serializer, but it sounds like the JSON thing is just an example.)
Say you iterate over all the x
elements somehow. The whole time you're doing that, function_a
is producing y
elements. You could try to use those elements as function_a
produces them, but if you want to do that while the downstream function that consumes the x
elements is running, you'll have to run the y
consumer at the same time, and the only way to do that without rewriting the consumer functions is to run them in two separate threads. That's the threads option.
If you don't use the y
elements immediately, you can store them, but if you're not using separate threads, the y
consumer will have to wait until the x
consumer finishes. That means you'll have to store the entire y
data stream, probably to a list.
If you don't use the y
elements immediately, and you don't store them, then they're gone. You can't pull them out of nothing when you're done with the x
elements. You have to generate them again, which means you'll need the elements of x_iter
, but those elements are gone too. You'll need to either recreate x_iter
, or store its contents up front (probably to a list). If you go that way, you probably wouldn't have function_a
generate both x
and y
elements - you'd probably write one function that generates the x
elements and one that generates the y
elements, so you don't waste time doing work you don't need.
Note that itertools.tee
doesn't let you get out of this. It has to store elements in memory too. itertools.tee
only pays off in cases where the tee iterators will stay positioned close to each other in the data stream. It's worse than list
if you're going to iterate over one iterator fully before starting on the other.
x, y = function_a(x_iter)
definitely doesn't work. As for thefor
loop, it's impossible to write the code like that because the iterators need to be processed by downstream functions that take iterators; the questioner cannot write an element-by-element loop.itertools.tee
only pays off if the tee iterators stay positioned close to each other in the data stream. It doesn't help in use cases where one iterator will be fully consumed before the other.