Java Stream API - Best way to transform a list: map or forEach?

Question

I have a list, myListToParse, where I want to filter the elements and apply a method on each element, and add the result in another list, myFinalList.

With the Stream API (added in Java 8), I noticed that I can do it in 2 different ways. I would like to know the more efficient way between them and understand why one way is better than the other one.

Method 1:

myFinalList = new ArrayList<>();
myListToParse.stream()
        .filter(elt -> elt != null)
        .forEach(elt -> myFinalList.add(doSomething(elt)));

Method 2:

myFinalList = myListToParse.stream()
        .filter(elt -> elt != null)
        .map(elt -> doSomething(elt))
        .collect(Collectors.toList());

I'm open for any suggestion about a third way.

The second one. A proper function should have no side effects, in your first implementation you are modifying the external world. — ThanksForAllTheFish, Commented Feb 4, 2015 at 10:32
just a matter of style, but elt -> elt != null can be replaced with Objects::nonNull — the8472, Commented Feb 4, 2015 at 10:40
@the8472 Even better would be to make sure there are no null values in the collection in the first place, and use Optional<T> instead in combination with flatMap. — herman, Commented Feb 4, 2015 at 10:47
@SzymonRoziewski, not quite. For something as trivial as this, the work needed to setup the parallelstream under the hood will make using this construct mute. — user3248346, Commented Feb 4, 2015 at 10:48
Note that you can write .map(this::doSomething) assuming that doSomething is a non-static method. If it's static you can replace this with the class name. — herman, Commented Feb 4, 2015 at 11:55

Naman · Accepted Answer · 2021-05-13 02:18:29Z

Don't worry about any performance differences, they're going to be minimal in this case normally.

Method 2 is preferable because

it doesn't require mutating a collection that exists outside the lambda expression.
it's more readable because the different steps that are performed in the collection pipeline are written sequentially: first a filter operation, then a map operation, then collecting the result (for more info on the benefits of collection pipelines, see Martin Fowler's excellent article.)
you can easily change the way values are collected by replacing the Collector that is used. In some cases you may need to write your own Collector, but then the benefit is that you can easily reuse that.

Giuseppe Bertone · Accepted Answer · 2016-01-23 19:02:15Z

59

I agree with the existing answers that the second form is better because it does not have any side effects and is easier to parallelise (just use a parallel stream).

Performance wise, it appears they are equivalent until you start using parallel streams. In that case, map will perform really much better. See below the micro benchmark results:

Benchmark                         Mode  Samples    Score   Error  Units
SO28319064.forEach                avgt      100  187.310 ± 1.768  ms/op
SO28319064.map                    avgt      100  189.180 ± 1.692  ms/op
SO28319064.mapWithParallelStream  avgt      100   55,577 ± 0,782  ms/op

You can't boost the first example in the same manner because forEach is a terminal method - it returns void - so you are forced to use a stateful lambda. But that is really a bad idea if you are using parallel streams.

Finally note that your second snippet can be written in a sligthly more concise way with method references and static imports:

myFinalList = myListToParse.stream()
    .filter(Objects::nonNull)
    .map(this::doSomething)
    .collect(toList());

edited Jan 23, 2016 at 19:02

Giuseppe Bertone

2,19215 silver badges17 bronze badges

answered Feb 4, 2015 at 12:09

assylias

326k83 gold badges671 silver badges791 bronze badges

1

About performance, in your case "map" really wins over "forEach" if you use parallelStreams. My benchmaks in milliseconds: SO28319064.forEach: 187,310 ± 1,768 ms/op -- SO28319064.map: 189,180 ± 1,692 ms/op --SO28319064.mapParallelStream: 55,577 ± 0,782 ms/op
– Giuseppe Bertone
Commented Jan 23, 2016 at 18:00
2

@GiuseppeBertone, it's up to assylias, but to my opinion your edit contradicts the original author's intent. If you want to add your own answer, it's better to add it instead of editing the existing one so much. Also now the link to the microbenchmark is not relevant to the results.
– Tagir Valeev
Commented Jan 24, 2016 at 2:28

Add a comment |

Craig P. Motlin · Accepted Answer · 2023-12-27 15:31:01Z

6

If you use Eclipse Collections, you can use the collectIf() method to filter and transform at the same time.

myFinalList = myListToParse.collectIf(
    Objects::nonNull,
    elt -> doSomething(elt));

It evaluates eagerly and should be a bit faster than using a Stream.

Note: I am a committer for Eclipse Collections.

edited Dec 27, 2023 at 15:31

answered Feb 5, 2015 at 22:21

Craig P. Motlin

26.7k18 gold badges101 silver badges126 bronze badges

While this seems correct and may be good advice, this does not answer the question asked, which is about transformation/map.
– Philippe Cloutier
Commented Aug 21, 2023 at 19:23
1

@PhilippeCloutier I edited my answer to use the doSomething() method that appears in the original question, to clarify that collectIf() does transform/map in addition to filtering.
– Craig P. Motlin
Commented Aug 25, 2023 at 1:23
Ah, I see now. I submitted an edit to clarify that this addresses the OP's specific problem.
– Philippe Cloutier
Commented Aug 25, 2023 at 17:52

Add a comment |

score 5 · Accepted Answer · 2015-02-04 14:19:47Z

One of the main benefits of using streams is that it gives the ability to process data in a declarative way, that is, using a functional style of programming. It also gives multi-threading capability for free meaning there is no need to write any extra multi-threaded code to make your stream concurrent.

Assuming the reason you are exploring this style of programming is that you want to exploit these benefits then your first code sample is potentially not functional since the foreach method is classed as being terminal (meaning that it can produce side-effects).

The second way is preferred from functional programming point of view since the map function can accept stateless lambda functions. More explicitly, the lambda passed to the map function should be

Non-interfering, meaning that the function should not alter the source of the stream if it is non-concurrent (e.g. ArrayList).
Stateless to avoid unexpected results when doing parallel processing (caused by thread scheduling differences).

Another benefit with the second approach is if the stream is parallel and the collector is concurrent and unordered then these characteristics can provide useful hints to the reduction operation to do the collecting concurrently.

Eran · Accepted Answer · 2015-02-04 10:34:55Z

1

I prefer the second way.

When you use the first way, if you decide to use a parallel stream to improve performance, you'll have no control over the order in which the elements will be added to the output list by forEach.

When you use toList, the Streams API will preserve the order even if you use a parallel stream.

answered Feb 4, 2015 at 10:34

Eran

392k56 gold badges713 silver badges780 bronze badges

I'm not sure this is correct advice: he could use forEachOrdered instead of forEach if he wanted to use a parallel stream but still preserve the order. But as the documentation for forEach states, preserving the encounter order sacrifices the benefit of parallelism. I suspect that is also the case with toList then.
– herman
Commented Feb 4, 2015 at 11:08
OK. According to docs.oracle.com/javase/tutorial/collections/streams/… the collect method is designed to perform the most common stream operations that have side effects in a parallel-safe manner. Operations like forEach and peek ... if you use one of these operations with a parallel stream, then the Java runtime may invoke the lambda expression that you specified as its parameter concurrently from multiple threads. So it carefully has some things atomic and some not. Weird. Assuming that is the case, option 1, if made parallel, might have contention issues...
– rogerdpack
Commented Jan 29, 2021 at 17:38

Add a comment |

Community · Accepted Answer · 2017-05-23 11:54:48Z

There is a third option - using stream().toArray() - see comments under why didn't stream have a toList method. It turns out to be slower than forEach() or collect(), and less expressive. It might be optimised in later JDK builds, so adding it here just in case.

assuming List<String>

    myFinalList = Arrays.asList(
            myListToParse.stream()
                    .filter(Objects::nonNull)
                    .map(this::doSomething)
                    .toArray(String[]::new)
    );

with a micro-micro benchmark, 1M entries, 20% nulls and simple transform in doSomething()

private LongSummaryStatistics benchmark(final String testName, final Runnable methodToTest, int samples) {
    long[] timing = new long[samples];
    for (int i = 0; i < samples; i++) {
        long start = System.currentTimeMillis();
        methodToTest.run();
        timing[i] = System.currentTimeMillis() - start;
    }
    final LongSummaryStatistics stats = Arrays.stream(timing).summaryStatistics();
    System.out.println(testName + ": " + stats);
    return stats;
}

the results are

parallel:

toArray: LongSummaryStatistics{count=10, sum=3721, min=321, average=372,100000, max=535}
forEach: LongSummaryStatistics{count=10, sum=3502, min=249, average=350,200000, max=389}
collect: LongSummaryStatistics{count=10, sum=3325, min=265, average=332,500000, max=368}

sequential:

toArray: LongSummaryStatistics{count=10, sum=5493, min=517, average=549,300000, max=569}
forEach: LongSummaryStatistics{count=10, sum=5316, min=427, average=531,600000, max=571}
collect: LongSummaryStatistics{count=10, sum=5380, min=444, average=538,000000, max=557}

parallel without nulls and filter (so the stream is SIZED): toArrays has the best performance in such case, and .forEach() fails with "indexOutOfBounds" on the recepient ArrayList, had to replace with .forEachOrdered()

toArray: LongSummaryStatistics{count=100, sum=75566, min=707, average=755,660000, max=1107}
forEach: LongSummaryStatistics{count=100, sum=115802, min=992, average=1158,020000, max=1254}
collect: LongSummaryStatistics{count=100, sum=88415, min=732, average=884,150000, max=1014}

John McClean · Accepted Answer · 2017-03-10 15:53:16Z

0

If using 3rd Pary Libaries is ok cyclops-react defines Lazy extended collections with this functionality built in. For example we could simply write

ListX myListToParse;

ListX myFinalList = myListToParse.filter(elt -> elt != null) .map(elt -> doSomething(elt));

myFinalList is not evaluated until first access (and there after the materialized list is cached and reused).

[Disclosure I am the lead developer of cyclops-react]

answered Mar 10, 2017 at 15:53

John McClean

5,3031 gold badge23 silver badges30 bronze badges

Add a comment |

Dávid Horváth · Accepted Answer · 2021-02-10 21:44:24Z

0

May be Method 3.

I always prefer to keep logic separate.

Predicate<Long> greaterThan100 = new Predicate<Long>() {
    @Override
    public boolean test(Long currentParameter) {
        return currentParameter > 100;
    }
};
        
List<Long> sourceLongList = Arrays.asList(1L, 10L, 50L, 80L, 100L, 120L, 133L, 333L);
List<Long> resultList = sourceLongList.parallelStream().filter(greaterThan100).collect(Collectors.toList());

edited Feb 10, 2021 at 21:44

Dávid Horváth

4,2201 gold badge23 silver badges36 bronze badges

answered Apr 5, 2016 at 13:56

Kumar Abhishek

3,09434 silver badges29 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Java Stream API - Best way to transform a list: map or forEach?

8 Answers 8

Not the answer you're looking for? Browse other questions tagged
java
java-8
java-stream
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

Not the answer you're looking for? Browse other questions tagged javajava-8java-stream or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
java
java-8
java-stream
or ask your own question.