5
\$\begingroup\$

Java 8 streams are my new golden hammer that I try to use as often as possible with often enormous gains in brevity and readability. Getting multiple return values, such as a maximum value and its associated element, is a bit cumbersome though (and requires an additional "pair" class).

Should I go back to "normal Java" for this task or is this syntax preferable? Is there a more concise way?

List<String> names = Arrays.asList("John","Paul","Ringo"); 
Pair<String,Integer> longestName = names.stream()
 .map(n->new Pair<>(n,n.length())) // pretend n.length() to be a lengthy operation
 .max(Comparator.comparing(Pair::getB)).get();
System.out.println(longestName.getA()+" "+longestName.getB());

P.S.: With "lengthy operation" I mean long running, as in I don't want to calculate it twice.

\$\endgroup\$
1
  • \$\begingroup\$ It was not a good idea to use a PS-note to describe the crux of the problem. Instead, it should be reflected in the question title and expressed in the main part of the post explicitly, not as a metaphor. The title "Finding the longest string and its length using Java streams" outweighs by a lot the inline comment about String.length() and PS-note and many readers probably misinterpreted this question (buy the way, people often skip inline comments when they understand the code). \$\endgroup\$ Commented May 17 at 19:02

3 Answers 3

4
\$\begingroup\$

Your concept is fine. My only criticism is in code style, and perhaps you should have a specific/custom container instead of the Pair. getA() and getB() should be getName() and getLength(). This also allows you to use primitive values and not Integer, and removes the confusing generic types.

Also:

  • don't get from Optionals at the stream end.
  • use white-space, it's your friend.
  • use meaningful names for lambda parameters (name is better than n )

The code I would write would look like:

Optional<Operation> maxOp = names.stream()
                  .map (name -> new Operation(name, name.length()))
                  .max (Comparator.comparingInt(Operation::getLength()));
Operation longest = maxOp.get();

System.out.println(longest.getName() + " " + longest.getLength());
\$\endgroup\$
1
  • \$\begingroup\$ An alternative to unpacking the optional is to apply Optional.ifPresent(). \$\endgroup\$ Commented May 17 at 18:37
20
\$\begingroup\$

You don't need the Pair. You can get the String with the maximum length by doing

names.stream().max(Comparator.comparingInt(String::length))
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Agreed. Unlike C strings, where strlen() is O(n), String::length should be fast. \$\endgroup\$ Commented Mar 25, 2016 at 17:44
  • 1
    \$\begingroup\$ Maybe I didn't explain the problem correctly. I want to return both the length and the string itself, so I want to have two return values. In this example, String::length is fast but this is just a stand in for a long running function. \$\endgroup\$ Commented Oct 1, 2019 at 14:12
  • 3
    \$\begingroup\$ @KonradHöffner this simple syntax gives you the longest string. Just get its length. \$\endgroup\$
    – Sxilderik
    Commented Oct 15, 2020 at 8:43
4
\$\begingroup\$

Cracking the Puzzle

String.length() a cheap constant time operation, and the problem of "finding the longest string" (as the question title says) doesn't require being cautious and avoiding invoking String.length() on each element more than once. The trick with wrapping strings into pairs is unlikely to bring any performance gains. In fact, it would rather be detrimental, because you're allocating additional O(n) space by creating a Pair instance for each stream element to avoid a very cheap call which has roughly the same cost as Pair.getB().

Hence, the proper way of "finding the longest string" is to operate with string elements directly without wrapping them, as suggested in Hank D's answer.

But this question is a Puzzle; apparently you are trying to solve a completely different problem.

Inconspicuous clues:

// pretend n.length() to be a lengthy operation

P.S.: With "lengthy operation" I mean long running, as in I don't want to calculate it twice.

Emphasis added

I.e. the problem "finding the longest string" is a metaphor for a task of finding the largest element in the Stream based on a key produced by a long-running operation which probably involves heavy data processing or IO.

So, OP uses String.length() as a substitution of this expensive operation.

Which is very confusing to say the least

The largest element by Key produced by a long-running operation

When you're dealing with some sort of IO, or CPU-intensive operation, it does make sense to ensure that it will be performed only once for each stream element with no redundant calls.

But as I said, the approach of using wrappers requires additional O(n) space.

How to avoid that? Imagine that you're using a plain for-loop. Then you can define two variables denoting the current largest element and the key associated with it. Constant space O(1), no wrappers.

To achieve the same with Stream API, we can create a custom Collector based on a mutable object holding a stream element and computed key.

The accumulation type (i.e. the mutable object that would be the core of the collector) requires a Function that maps a stream element to a key and a Comparator as a mean of comparing the keys (solution will be more flexible if we will not assume keys implement Comparable).

To define a collector, we can utilize static factory method Collector.of().

public static <T, K> Collector<T, ?, Optional<KeyValue<K, T>>> maxBy(Function<? super T, ? extends K> keyExtractor,
                                                                     Comparator<? super K> keyComparator) {
    
    return Collector.of(
        () -> new MaxKeyAccumulator<K, T>(keyExtractor, keyComparator),
        MaxKeyAccumulator::accept,
        MaxKeyAccumulator::merge,
        MaxKeyAccumulator::getResult
    );
}

Since a stream can be empty, this collector produces an Optional

Custom accumulation type:

public class MaxKeyAccumulator<K, T> implements Consumer<T> {
    private final Function<? super T, ? extends K> keyExtractor;
    private final Comparator<K> keyComparator;
    private T value;
    private K key;
    
    public MaxKeyAccumulator(Function<? super T, ? extends K> keyExtractor,
                             Comparator<? super K> initialComparator) {
        this.keyExtractor = keyExtractor;
        this.keyComparator = Comparator.nullsFirst(initialComparator);
    }
    
    @Override
    public void accept(T next) {
        K nextKey = keyExtractor.apply(Objects.requireNonNull(next));
        
        if (isGreaterThanThisKey(nextKey)) {
            value = next;
            key = nextKey;
        }
    }
    
    public MaxKeyAccumulator<K, T> merge(MaxKeyAccumulator<K, T> other) {
        return isGreaterThanThisKey(other.key) ? other : this;
    }
    
    private boolean isGreaterThanThisKey(K nextKey) {
        return Objects.compare(nextKey, key, keyComparator) > 0;
    }
    
    public Optional<KeyValue<K, T>> getResult() {
        return value != null
             ? Optional.of(new KeyValue<>(key, value))
             : Optional.empty();
    }
}

public record KeyValue<K, V>(K key, V value) {}

Notes:

  • This implementation assumes that null-values have no special meaning in your business logic (if it's not the case in your project, you have a way more serious problem to solve than finding max element with streams) and therefore prohibits them.
  • Comparator.nullsFirst() wraps the given comparator to simplify the logic of accumulating and merging (resulting comparator basically combines comparison with the check if the key was initialized).
  • You can make this Collector speak your domain language by returning something like EmployeeAnnualRevenue instead of KeyValue (without sacrificing collector's reusability). It can be done adding the third parameter of type BiFunction to maxBy() method (show above), and applying this function inside the finisher function of the Collector to transform the accumulation type to your domain type (leaving this improvement as a homework).

Test suite


@Test
@DisplayName("""
    [ maxBy ] Should produce an empty
     optional when the stream is empty""")
void maxKey_ShouldReturnEmptyOptional_WhenStreamIsEmpty() {
    var max = Stream.<LocalDate>empty()
        .collect(maxBy(
            LocalDate::getMonth,
            Comparator.comparing(Month::toString)
        ));
    
    assertThat(max).isEmpty();
}

@Test
@DisplayName("""
    [ maxBy ] Should produce an optional
     containing a value associated with the
     largest key when the stream is not empty""")
void maxKey_ShouldReturnMaxElement_WhenStreamIsNotEmpty() {
    var max = Stream.of(
            LocalDate.of(1919, 1, 21), 
            LocalDate.of(1776, 7, 4), 
            LocalDate.of(1710, 4, 5)
        )
        .collect(maxBy(
            LocalDate::getYear,
            Comparator.reverseOrder()
        ));
    
    assertThat(max.map(KeyValue::key))
        .contains(1710);
    
    assertThat(max.map(KeyValue::value))
        .contains(LocalDate.of(1710, 4, 5));
}

@Test
@DisplayName("""
    [ maxBy ] Should produce an optional
     containing a value associated with the
     largest key when the stream is parallel""")
void maxKey_ShouldReturnMaxElement_WhenStreamIsParallel() {
    var max = Stream.of(
            LocalDate.of(1919, 1, 21), 
            LocalDate.of(1776, 7, 4), 
            LocalDate.of(1710, 4, 5)
        )
        .parallel()
        .collect(maxBy(
            LocalDate::getMonth,
            Comparator.naturalOrder()
        ));
    
    assertThat(max.map(KeyValue::key))
        .contains(Month.JULY);
    
    assertThat(max.map(KeyValue::value))
        .contains(LocalDate.of(1776, 7, 4);
}

Test dependencies: JUnit, AssertJ

Avoid using Optional.get()

The problem with using Optional.get() is that it does not effectively communicate its intent to the reader of the code. The name get fails to indicate that it will blow up with a NoSuchElementException if the optional is empty.

To make this intent explicit, a better option would be to use orElseThrow().

Starting with Java 10, documentation advises using orElseThrow() instead of get(), as stated in the get() method's API note:

API Note:

The preferred alternative to this method is orElseThrow().

Yes, at the time this question was written, the parameterless version of orElseThrow() was not yet available in the JDK (the only alternative option was verbose orElseThrow(NoSuchElementException::new)), but it doesn't invalidate the point. Now we have a more expressive option.

Important note

There are very few use cases when we have on our hands an optional which by design is guaranteed to be not empty. Consider the following code:

var result = someStream
    // some operations
    .collect(groupingBy(
        classifierFunction, 
        collectingAndThen(maxBy( ... ), Optional::get)
    ));

In this example, Collector maxBy is used as a downstream of groupingBy and guaranteed to generate a non-empty optional (if the stream is empty, the groupingBy will produce an empty Map and maxBy would not be triggered). It's a rare case, when Optional.get() might still be appropriate.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.