12
$\begingroup$

Many high-level languages have some kind of protocol or interface for iterators, allowing user-defined types to be used in "for-each" loops. A few examples from popular languages:

  • In Java, a class which implements the Iterator<E> interface must provide a hasNext() method to indicate whether the iteration should continue, and a next() method to return the next iterated element. Additionally, there is an optional remove() method to remove the element previously returned by next() from the underlying collection.
  • In Javascript, an object implements the iterator protocol by providing a next() method returning an object with a value property for the iterated element, and a done property indicating whether the iterator is exhausted. (When done is true, value is not required.)
  • In Python, a class can be iterable in two ways:
    • It can have a __len__ method and a __getitem__ method to represent an iterable sequence, where each element can be accessed by index;
    • It can have an __iter__ method returning an iterator object, which provides a __next__ method. When this method is called, it either returns the next iterated element, or it raises the special exception StopIteration to indicate that the iterator is exhausted.

These protocols differ in the methods that the object must provide, and the expected behaviour of those methods. What are the advantages and disadvantages of these designs, or of other iteration protocols from other languages?

$\endgroup$
4
  • 3
    $\begingroup$ Rust's is similar to Java's except it uses an Option instead of hasNext $\endgroup$
    – Seggan
    Commented May 30, 2023 at 16:00
  • 2
    $\begingroup$ @Seggan I'd say that makes it more similar to Javascript's ─ it returns a kind of wrapped value which holds both the iterated value and the termination condition. However, Javascript's protocol also allows for {value: 23, done: true} to indicate a return value from a generator. $\endgroup$
    – kaya3
    Commented May 30, 2023 at 16:02
  • 2
    $\begingroup$ Haskell's Foldable might be another approach worth noting--not that it actually supports imperative for-loops, but the general idea is certainly applicable: rather than provide hooks for a for-loop to iterate with, the type provides the entire loop construct itself. $\endgroup$ Commented May 30, 2023 at 16:06
  • 3
    $\begingroup$ C#’s IEnumerator<T> has bool MoveNext() and T Current, as well as an optional void Reset() to support restarting mid-iteration. $\endgroup$
    – Bbrk24
    Commented May 30, 2023 at 16:12

6 Answers 6

8
$\begingroup$

Java

  • a separate next and hasNext method is inconvenient to implement, but nice for users
    • In many cases it might be actually impossible to implement hasNext. Consider the case where you have a iterator representing some data downloaded over the internet. You never know when the other side is going to stop sending.
    • On the other hand, this gives users information that would be impossible to receive otherwise. While a peek adapter can tell you this info it would consume one extra step from the iterator and if that causes side effects it could be undesirable.
    • A hasNext method allows very egronomic looping in a for(;;) c-style for loop, if your language lacks a foreach equivalent this might be a important feature.
  • A remove method is quite useful since removing things from iterators while iterating is famously hard, but it's also only useful for some very specific types of iterators so it's weird to have it as part of a interface, even when optional.
  • There is no protection from advancing past the end of the iterator

Javascript

The main downside of Javascript iterators is the limited amount of functions and tooling that supports them, not the interface itself.

The downside of returning both next and done is that it's easy to keep iterating by accident after the end and a type checker won't be able to detect this. If your language has the equivalent of #[must_use] to warn about not using a value, that risk could be avoided. JS does not though.

Python

As pppery said the "old-style" python iterators are quite confusing and often lead to things unintentionally being iterators but then not actually working as iterators.

Raising a exception on error is very verbose to deal with compared to just checking a boolean. On the other hand it's impossible to forget to check which is nice.

Rust

In rust iterators impelement a trait, iter.next() returns a Option<T>. This is nice because you are forced to check if you have reached the end and can't cause bugs.

Iterator convenience method are implemented on the trait itself which leads to a nice chaining syntax in reading order, unlike python where you need to nest functions leading to a "in to out" execution order.

Iterables in rust are requied to implement into_iter() but in practice most implement iter() too which doesn't consume the reference so is usually preferred. Many implement other alternative methods too like .iter_mut(). The many ways to convert things into iterators is is confusing.

$\endgroup$
6
  • 1
    $\begingroup$ This is a good answer, although it focuses mostly on the consumer's side of the protocol (i.e. how to consume an iterable/iterator, if you aren't just using the language's built-in for loop that handles things). I suppose that is still important since for some use-cases you do have to consume iterators in user-written code. Nonetheless, I think the implementor's side of the protocol is probably more significant from a usability perspective, since it's more common to have to make a user-defined type iterable. $\endgroup$
    – kaya3
    Commented May 30, 2023 at 16:28
  • 1
    $\begingroup$ @kaya3 If you just use the built in iterators in a built in for each loop it doesn't really matter how the protocol is defined since the details will be handled for you $\endgroup$
    – mousetail
    Commented May 30, 2023 at 16:29
  • 1
    $\begingroup$ My point is not about built-in iterators, it's about implementing the iteration protocol on a user-defined type, and that being a more common task than having to manually consume an iterator. That said, even when using only built-in iterators for built-in types in a for loop, there may be performance considerations between the different protocol designs. $\endgroup$
    – kaya3
    Commented May 30, 2023 at 16:40
  • $\begingroup$ Swift is similar to Rust except that its equivalent to into_iter doesn’t invalidate the original object — in practice that usually means it just bumps the refcount, since all the builtin collection types are CoW. $\endgroup$
    – Bbrk24
    Commented May 30, 2023 at 16:42
  • $\begingroup$ @kaya3 True, there is just not much to say about the difference in implementation for returning a special object or raising a exception. It only really matters for the users of the API. Only Java really has unique features that effect implementors $\endgroup$
    – mousetail
    Commented May 30, 2023 at 20:28
8
$\begingroup$

A commonly-cited argument against Python's approach, where the StopIteration exception indicates that an iterator is exhausted, is that raising and catching an exception is typically slower than returning a value and checking it with an if statement. However, this is not true in CPython, since the CPython interpreter handles StopIteration as a special case in order to make iteration fast. So this is more of an implementation concern than a design concern.


Nonetheless, this approach does have a specific pitfall, which is that it allows for non-local control flow effects. This usually only happens by accident, but it's hard to reason about so it can cause bugs which are difficult to diagnose. The issue occurs when StopIteration is thrown indirectly by something a __next__ method calls, rather than intentionally by the __next__ method itself. In this case, the StopIteration exception bubbles up and is caught by whatever consumes the iterator; the consumer then thinks the iterator is exhausted when it isn't necessarily. An example of this occurring "in the wild" is given in this Stack Overflow Q&A.

The protocol could be changed slightly to prevent this: each iterator could have its own sentinel exception, provided by the consumer, so that the consumer knows it can only be thrown by that specific iterator. However, it would make the protocol more complicated to implement, which doesn't seem worth it to address a pitfall that only occurs very rarely.

The other approaches, which use the return value of the next or hasNext method to signal termination, don't have this problem because unlike exceptions, return values cannot have non-local control flow effects that automatically bubble up the stack.

$\endgroup$
5
$\begingroup$
  1. The first Python syntax has the advantage that custom list-like objects become iterable automatically without the user thinking about it.
  2. Conversely, the first Python syntax has the disadvantage that it's a heruistic that sometimes isn't correct - if you are implementing a custom dictionary-like object then it will have a __getitem__ and a __len__, but synthesizing them won't produce a useful iterator.
  3. Obviously, the Java syntax supports a remove method that no other language does, which could be said to be an advantage, as removing elements from the list while iterating it is a common pain point. It also makes it clear that that operation may not be supported if it doesn't make sense.
  4. The Java syntax has the disadvantage that it requires the iterator to know whether there is a next element without producing it, which may not be possible for some iterators (think a Python generator - you never know whether it will yield another value or return before running the code). While you can of course work around this by calculating the next value in hasNext, that's counterintuitive and confusing.
$\endgroup$
3
$\begingroup$

The iteration protocols in the question all rely on mutable iterators: Each iteration has a single iterator object on which a .next method is called for each step; the iterator must mutate its state between .next calls to advance.

An example that also allows immutable iterators is the Julia iterator interface. This is realised by each iteration step providing a state in addition to a value:

  • iterate(iterable) starts iteration by producing a new state
  • iterate(iterable, state) resumes iteration from the previous state

By passing the state into each iteration step and receiving the next state, state may be newly created on each step and thus be immutable.

There are some advantages to this:

  • This protocol is still compatible with mutable iteration as iterate is free to mutate and pass back the state it got or to mutate iterable.
  • For immutable iterable and state, the tuple of (iterable, state) clearly defines a point in the iteration. This makes it simple and safe to split one iterator into two or more at any point, or to safe an earlier state to roll back to it.
  • The end-of-iteration can be clearly marked by returning the sentinel value nothing instead of a tuple. This makes it impossible to accidentally proceed with iteration past the end, yet allows the sentinel to still be a valid value as (nothing, state).

The added complexity of lugging around both iterable and state can be easily hidden by syntactic sugar. For example, a for item in iter loop desugars very similar to other languages:

next = iterate(iter)
while next !== nothing
    (item, state) = next
    # body
    next = iterate(iter, state)
end

Of course, instead of an imperative while loop it could also be desugarred in a functional style using recursion to go along with promoting immutability.

$\endgroup$
0
-1
$\begingroup$

The language*1 which has the best possible iterators is, in my opinion, C#. I will explain why.

  1. The collection to be iterated does not have to implement any particular interface. All standard collections implement IEnumerable, but that is not necessary: if the collection has a IEnumerator<T> GetEnumerator() function, it is eligible for participation in a foreach loop.

  2. If the IEnumerator returned by GetEnumerator() is disposable, then at the end of the foreach loop the disposable will be disposed. Contrast this with Java, where the programmer has to be aware of, and explicitly dispose, any Iterator that happens to be Closeable.

  3. The functions offered by IEnumerator are bool HasNext(), void MoveNext(), and T Current (a property.) Thus, if you want access to the IEnumerator you can refrain from using a foreach loop, and you can use a regular for loop instead, where the initialization clause of the for loop creates the enumerator, the condition clause invokes HasNext(), the step clause invokes MoveNext(), and the current value is at any moment available via the Current property, which you can invoke multiple times in the same iteration if you need to.

  4. This makes it very easy to create decorators of IEnumerator. (wikipedia) Contrast this to Java, where the functions offered by Iterator cannot easily be used in a regular for loop, and decorators are really hard to write. A filtering iterator cannot be implemented without cumbersome look-ahead logic, and then it is impossible to use it for removing items from the collection because looking ahead means that you are always past the item you want to delete.


*1 I only consider strongly typed languages. Weakly typed languages like Javascript, Python, etc. receive nothing but scorn from me.

$\endgroup$
1
  • $\begingroup$ You hate weakly typed languages yet praise duck typing features in C#, ironic $\endgroup$
    – mousetail
    Commented Dec 21, 2023 at 8:04
-1
$\begingroup$

Multiple return

If a language supports returning multiple values in functions or methods, a somewhat better protocol is the generator to return (live: bool, ref data: T?), where live indicates that the enumerator reader should examine the data or keep running, and data is an Optional<T> that indicates if any data is produced in this iteration.

This protocol would cover both direct and async usage, and would further permit filter dropping or accumulating usage in direct code.

Ref return

The ref data instead of simple data is designed to reduce allocation of temporaries drastically in enumerations. To zero, in special.

Other protocols may cause unnecessary allocations by preserving the next data in a backing field, after running/returning of hasMext, and would otherwise impossibilite the creation of generators of "stack only" data types.

The language may offer composite, possible allocating IteratorResult<T> that conforms to Java/C# protocol, that buffers the results outside the generator, instead of forcing the existence of storage by design.

Normal syntax

With or without multiple returns, I also argue that a big step in easing the creation and usage of generators and iterations is to make their syntax simple, instead of special.

A generator is basically an async function that returns multiple times. The same with iterators.

A simple syntax for this wuuld be, for example, and taking note from languages that names the Optional<T> cases with None and Some<T>, to mark generator and/or iterator functions with returning Many<T>

public async Many<T> Filter( Predicate pred )
{ }
$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .