Cracking the Puzzle
String.length()
a cheap constant time operation, and the problem of "finding the longest string" (as the question title says) doesn't require being cautious and avoiding invoking String.length()
on each element more than once. The trick with wrapping strings into pairs is unlikely to bring any performance gains. In fact, it would rather be detrimental, because you're allocating additional O(n) space by creating a Pair
instance for each stream element to avoid a very cheap call which has roughly the same cost as Pair.getB()
.
Hence, the proper way of "finding the longest string" is to operate with string elements directly without wrapping them, as suggested in Hank D's answer.
But this question is a Puzzle; apparently you are trying to solve a completely different problem.
Inconspicuous clues:
// pretend n.length()
to be a lengthy operation
P.S.: With "lengthy operation" I mean long running, as in I don't want
to calculate it twice.
Emphasis added
I.e. the problem "finding the longest string" is a metaphor for a task of finding the largest element in the Stream based on a key produced by a long-running operation which probably involves heavy data processing or IO.
So, OP uses String.length()
as a substitution of this expensive operation.
Which is very confusing to say the least
The largest element by Key produced by a long-running operation
When you're dealing with some sort of IO, or CPU-intensive operation, it does make sense to ensure that it will be performed only once for each stream element with no redundant calls.
But as I said, the approach of using wrappers requires additional O(n) space.
How to avoid that? Imagine that you're using a plain for
-loop. Then you can define two variables denoting the current largest element and the key associated with it. Constant space O(1), no wrappers.
To achieve the same with Stream API, we can create a custom Collector
based on a mutable object holding a stream element and computed key.
The accumulation type (i.e. the mutable object that would be the core of the collector) requires a Function
that maps a stream element to a key and a Comparator
as a mean of comparing the keys (solution will be more flexible if we will not assume keys implement Comparable
).
To define a collector, we can utilize static factory method Collector.of()
.
public static <T, K> Collector<T, ?, Optional<KeyValue<K, T>>> maxBy(Function<? super T, ? extends K> keyExtractor,
Comparator<? super K> keyComparator) {
return Collector.of(
() -> new MaxKeyAccumulator<K, T>(keyExtractor, keyComparator),
MaxKeyAccumulator::accept,
MaxKeyAccumulator::merge,
MaxKeyAccumulator::getResult
);
}
Since a stream can be empty, this collector produces an Optional
Custom accumulation type:
public class MaxKeyAccumulator<K, T> implements Consumer<T> {
private final Function<? super T, ? extends K> keyExtractor;
private final Comparator<K> keyComparator;
private T value;
private K key;
public MaxKeyAccumulator(Function<? super T, ? extends K> keyExtractor,
Comparator<? super K> initialComparator) {
this.keyExtractor = keyExtractor;
this.keyComparator = Comparator.nullsFirst(initialComparator);
}
@Override
public void accept(T next) {
K nextKey = keyExtractor.apply(Objects.requireNonNull(next));
if (isGreaterThanThisKey(nextKey)) {
value = next;
key = nextKey;
}
}
public MaxKeyAccumulator<K, T> merge(MaxKeyAccumulator<K, T> other) {
return isGreaterThanThisKey(other.key) ? other : this;
}
private boolean isGreaterThanThisKey(K nextKey) {
return Objects.compare(nextKey, key, keyComparator) > 0;
}
public Optional<KeyValue<K, T>> getResult() {
return value != null
? Optional.of(new KeyValue<>(key, value))
: Optional.empty();
}
}
public record KeyValue<K, V>(K key, V value) {}
Notes:
- This implementation assumes that
null
-values have no special meaning in your business logic (if it's not the case in your project, you have a way more serious problem to solve than finding max element with streams) and therefore prohibits them.
Comparator.nullsFirst()
wraps the given comparator to simplify the logic of accumulating and merging (resulting comparator basically combines comparison with the check if the key
was initialized).
- You can make this Collector speak your domain language by returning something like
EmployeeAnnualRevenue
instead of KeyValue
(without sacrificing collector's reusability). It can be done adding the third parameter of type BiFunction
to maxBy()
method (show above), and applying this function inside the finisher function of the Collector to transform the accumulation type to your domain type (leaving this improvement as a homework).
Test suite
@Test
@DisplayName("""
[ maxBy ] Should produce an empty
optional when the stream is empty""")
void maxKey_ShouldReturnEmptyOptional_WhenStreamIsEmpty() {
var max = Stream.<LocalDate>empty()
.collect(maxBy(
LocalDate::getMonth,
Comparator.comparing(Month::toString)
));
assertThat(max).isEmpty();
}
@Test
@DisplayName("""
[ maxBy ] Should produce an optional
containing a value associated with the
largest key when the stream is not empty""")
void maxKey_ShouldReturnMaxElement_WhenStreamIsNotEmpty() {
var max = Stream.of(
LocalDate.of(1919, 1, 21),
LocalDate.of(1776, 7, 4),
LocalDate.of(1710, 4, 5)
)
.collect(maxBy(
LocalDate::getYear,
Comparator.reverseOrder()
));
assertThat(max.map(KeyValue::key))
.contains(1710);
assertThat(max.map(KeyValue::value))
.contains(LocalDate.of(1710, 4, 5));
}
@Test
@DisplayName("""
[ maxBy ] Should produce an optional
containing a value associated with the
largest key when the stream is parallel""")
void maxKey_ShouldReturnMaxElement_WhenStreamIsParallel() {
var max = Stream.of(
LocalDate.of(1919, 1, 21),
LocalDate.of(1776, 7, 4),
LocalDate.of(1710, 4, 5)
)
.parallel()
.collect(maxBy(
LocalDate::getMonth,
Comparator.naturalOrder()
));
assertThat(max.map(KeyValue::key))
.contains(Month.JULY);
assertThat(max.map(KeyValue::value))
.contains(LocalDate.of(1776, 7, 4);
}
Test dependencies: JUnit, AssertJ
Avoid using Optional.get()
The problem with using Optional.get()
is that it does not effectively communicate its intent to the reader of the code. The name get
fails to indicate that it will blow up with a NoSuchElementException
if the optional is empty.
To make this intent explicit, a better option would be to use orElseThrow()
.
Starting with Java 10, documentation advises using orElseThrow()
instead of get()
, as stated in the get()
method's API note:
API Note:
The preferred alternative to this method is orElseThrow()
.
Yes, at the time this question was written, the parameterless version of orElseThrow()
was not yet available in the JDK (the only alternative option was verbose orElseThrow(NoSuchElementException::new)
), but it doesn't invalidate the point. Now we have a more expressive option.
Important note
There are very few use cases when we have on our hands an optional which by design is guaranteed to be not empty. Consider the following code:
var result = someStream
// some operations
.collect(groupingBy(
classifierFunction,
collectingAndThen(maxBy( ... ), Optional::get)
));
In this example, Collector maxBy
is used as a downstream of groupingBy
and guaranteed to generate a non-empty optional (if the stream is empty, the groupingBy
will produce an empty Map and maxBy
would not be triggered). It's a rare case, when Optional.get()
might still be appropriate.
String.length()
and PS-note and many readers probably misinterpreted this question (buy the way, people often skip inline comments when they understand the code). \$\endgroup\$