Java: Benchmark findFirst() and findAny() methods on non-parallel streams

Question

I would like to know how representative the following benchmark is? Can we infer from it that findFirst is much slower than findAny for non-parallel streams? Or is this particular use case too specific?

In addition, I would like to know whether the benchmark has been implemented correctly from a technical point of view.

import java.util.Random;

public class BMTest {
    public static void main(String[] args) {
        int findFirstFaster = 0;
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < 10; i++) {
            int n = 1000 * (i + 1);
            long t1 = getTimeFindFirst(n);
            long t2 = getTimeFindAny(n);
            int faster = t1 < t2 ? 1 : 0;
            findFirstFaster += faster;
            sb.append(String.format("| %02d | %05d | %010d | %010d | %f | %b |%n", i + 1, n, t1, t2, (double) t1 / (t1 + t2), faster == 1));
        }
        System.out.println(sb);
        System.out.println("findFirstFaster = " + findFirstFaster);
    }

    public static long getTimeFindFirst(int n) {
        Random rand = new Random();
        long sum = 0;

        long time = System.nanoTime();
        for (int i = 0; i < n; i++) {
            sum += rand.ints(10_000).filter(x -> x % 7 == 0).findFirst().orElseThrow();
        }
        time = System.nanoTime() - time;
        System.out.println("sum = " + sum);
        return time;
    }

    public static long getTimeFindAny(int n) {
        Random rand = new Random();
        long sum = 0;

        long time = System.nanoTime();
        for (int i = 0; i < n; i++) {
            sum += rand.ints(10_000).filter(x -> x % 7 == 0).findAny().orElseThrow();
        }
        time = System.nanoTime() - time;
        System.out.println("sum = " + sum);
        return time;
    }
}

Example output:


01	01000	0010918901	0002353900	0,822652	false
02	02000	0001526700	0001113200	0,578317	false
03	03000	0001618900	0001715000	0,485587	true
04	04000	0002009100	0001740900	0,535760	false
05	05000	0002041899	0001900000	0,517999	false
06	06000	0002077800	0002103601	0,496915	true
07	07000	0002630900	0002274900	0,536284	false
08	08000	0002108900	0002100399	0,501010	false
09	09000	0002266100	0002248400	0,501960	false
10	10000	0004679400	0002199700	0,680234	false

No. Use JMH with warmups. You aren't comparing the methods acting on same input, and you shouldn't bother timing rand.ints(10_000). Pass same set of ints to both tests (calc upfront or use same random seed). — DuncG, Commented Jul 4 at 14:42

Alexander Ivanchenko · Accepted Answer · 2024-07-02 23:54:16Z

whether the benchmark has been implemented correctly

I'm afraid, no.

JVM does quite a bit of optimizations at runtime, compiling certain frequently called methods into highly optimized native code. But for it to happen JVM needs to profile the code running it in the interpreted mode, only then JIT compiler kicks in *. This process is called JVM warmup.

When writing benchmarks, it's important to take JVM warmup into account because performance is constantly changing during this phase.

These tests are nondeterministic since a newly created instance of Random obtained via no-args constructor serves as a stream source in each method (i.e. in each case stream elements will be different). If instead you were providing a predefined seed while constructing Random, then it'll guaranteed that the predicate of the filter() will be evaluated to true on the same element in both streams.

Don't reinvent the wheel, use trustworthy tools specifically designed for this purpose, such as Java Microbenchmark Harness.

* ^{That's a very simplified description. Modern JVMs are using tiered compilation, which employs two JIT compilers C1 and C2 to optimize hot code faster. And sometimes a compiled piece of code might even be deoptimized and start running in the interpreted mode again because a speculative assumption on which optimization was based proved to be incorrect.}

Can we infer from it that findFirst() is much slower than findAny() for non-parallel streams?

This might sound boring, but findFirst() and findAny() operations represent different use cases: 1) you need exactly the first element, 2) you're OK with any element.

It'll be a mistake to assume that if a stream is sequential, then you're free to employ findAny() when in fact findFirst() is needed. This premature optimization is not likely to generate any tangible performance gain, but might lead to incorrect results if you or your colleagues will switch the stream to run in parallel.

Premature optimization is evil

Code style

Long chains of method calls are hard to follow and comprehend. Consider breaking them into multiple lines to improve readability:

sum += rand.ints(10_000)
    .filter(x -> x % 7 == 0)
    .findFirst()
    .orElseThrow();

@TobiasGrothe If there is no requirement for the first matching element to be returned, I'd use findAny() even if it did not provide a performance gain, because it documents the lack of that requirement. This might help with understanding the code. But a comment about why this is not a requirement will always be better (unless it's obvious from the use case). — Feuermurmel, Commented Jul 3 at 9:47
@TobiasGrothe the nature of the Stream source has a far bigger impact on the performance than anything else. That’s especially true for a source as expensive as a random number generator. Given that the stream from Random is unordered to begin with, there’s obviously no difference between findFirst() and findAny(). There simply is no “first” and the documentation of findFirst() states: “If the stream has no encounter order, then any element may be returned.” — Holger, Commented Jul 3 at 10:54
@TobiasGrothe findAny() method is a better choice when there might be several suitable results in the stream, and there is no preference for which element to return. This is exactly what you described. On the other hand, findFirst() is used when it is necessary to leverage the ordering of the stream source to find the earliest element that meets the criteria. Example: elements in the source are chronologically ordered by timestamp, and we need the earliest event in a specific area dedicated to a particular topic, findFirst() is the appropriate tool for this job. — Alexander Ivanchenko, Commented Jul 4 at 14:31
@TobiasGrothe Holger brought up an interesting point. Some stream sources do not have an encounter order (HashSet, Random). As a result, the streams they produce are considered to be unordered. Such stream does not care about ordering related constraints while executing findFirst(), takeWhile(), dropWhile(), limit(), skip(), etc. You can verify that streams produced by an instance of Random are unordered by checking the characteristics of the stream's spliterator. — Alexander Ivanchenko, Commented Jul 4 at 14:36
@TobiasGrothe rand.ints(10_000).spliterator().characteristics() & Spliterator.ORDERED will be evaluated to 0 because bit flag ORDERED is not present. — Alexander Ivanchenko, Commented Jul 4 at 14:37

Stack Exchange Network

Java: Benchmark findFirst() and findAny() methods on non-parallel streams

1 Answer 1

Code style

Not the answer you're looking for? Browse other questions tagged
java
stream
benchmarking
or ask your own question.

Hot Network Questions

Java: Benchmark findFirst() and findAny() methods on non-parallel streams

1 Answer 1

Code style

Not the answer you're looking for? Browse other questions tagged javastreambenchmarking or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
java
stream
benchmarking
or ask your own question.