4

I have an iterator that yields a dictionary with several data fields. Is there a way to split this stream of data into two following iterators that each feature only a certain data field of the preceding streamed dictionary?

class Splitter(IteratorBase):
    def __init__(self, iterable):
        super().__init__(iterable)

    def __iter__(self):
        for pt in self.iterable:
            yield pt["field1"], pt["field2"]

does not work, because this just yields tuples with both fields

2
  • I had no issue running your code after removing BaseIterator, changing in __init__ to have self.iterable = iterable, and giving data as [{"field1": 1, "field2": 2}, {"field1": 3, "field2": 4}]. A simple foo = Splitter(data) and for a, b in foo: works fine.
    – felipe
    Commented Jan 16, 2020 at 18:37
  • 1
    @FelipeFaria This code produces a single iterator of tuples, not a pair of separate iterators. (Which, to be fair, isn't that much different from what itertools.tee does, though the tuple iterator is only accessible between the tee iterators.)
    – chepner
    Commented Jan 16, 2020 at 18:48

1 Answer 1

4

You can use itertools.tee:

import itertools

# Make iterator for some data
data = [{'field1': 1, 'field2': 2}, {'field1': 3, 'field2': 4}]
it = iter(data)
# Make two iterators out of the first one
it1, it2 = itertools.tee(it)
# Use first iterator for field1
it1 = (elem['field1'] for elem in it1)
# Use second iterator for field2
it2 = (elem['field2'] for elem in it2)
# Print elements of each iterator
print(*it1)
# 1 3
print(*it2)
# 2 4
7
  • Confusing to reuse variable names like that.
    – Barmar
    Commented Jan 16, 2020 at 18:32
  • thanks, but will this traverse the data collection twice or just once? I am asking because of efficacy
    – CD86
    Commented Jan 16, 2020 at 18:34
  • 2
    One traversal, but elements pulled by one iterator have to be cached in memory until the other iterator uses it as well.
    – chepner
    Commented Jan 16, 2020 at 18:38
  • @CD86 The iterator is traversed just once (note this works for any iterator, and many cannot be traversed twice). Internally, tee reads elements as they are needed and stores them in an internal queue until all iterators have gone through it (see explanation in the docs).
    – javidcf
    Commented Jan 16, 2020 at 18:39
  • I forgot to mention that the following code expects the resulting iterators to be of type or subclass of IteratorBase
    – CD86
    Commented Jan 16, 2020 at 18:44

Not the answer you're looking for? Browse other questions tagged or ask your own question.