JavaOne – Make your CPU cores sweat with Parallel Streams | Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky

“Make your CPU cores sweat with Parallel Streams”

Speaker: Lukasz Pater

For more blog posts from JavaOne, see the table of contents

Started with the canonical Person/Car/age example to show what streams are [I think everyone at Javaone knows that]

Then showed code that finds all the prims under 10 milion that are palindromes when expressed in binary. Moores law is now relying on multi-core architecture. This example takes 1.6 seconds vs 8.6 seconds as parallel improvement. Five times faster.

Good analogy: If told to wash the fork, then the knife, then the… it is slow. If told to wash the dishes, you can optimize internally.

History of threads

JDK 1 – Threads – still good for small background task
JDK 5 – ExecutorService, concurrent objects
JDK 7 – fork/join framework – recursively decompose tasks into subtasks and then combine results
JDK 8 – parallel streams – use fork/join framework and Spliterator behind the scenes

Making a parallel stream

parallelStream() – for source
parallel() – anywhere in stream pipeline intermediate operation list

Fork Join Pool

Uses work stealing to balance tasks amongst workers in pool
All parallel streams use one common pool instance with # threads = # CPU cores – 1. That final thread is for the master to assign work.
Can change by setting system property java.util.concurrent.ForkJoinPool.common.parallelism. Must pass on commands line because first call to parallel stream resets it if set in code
If want custom fork join pool, create one and submit your stream to it. Does not recommend doing this. One reason you might want to is to add a timeout to the stream

Warnings

Avoid IO – burn CPU cycles waiting for IO/network
Use only for CPU intensive tasks
Be careful with nested paralle streams
Having many smaller tasks in the pool will better balance the workload
Don’t create your own fork join pool

Spliterator

splitable iterator
to traverse elements of a source in parallel
tryAdvance(Consumer) – do something if an element exists
trySplit() – partition off some elements to another spliterator leaving less elements in the original – fork so have tree of spliterators until run out of elements
characteristics()
estimateSize()
StreamSupport.stream(mySplierator, true) – creates parallel stream from spliterator – shouldn’t need to do this
ArrayList decomposes into equal sizes. LinkedList gives a smaller % of the elements because linear to get elements and want to minimize wait time
ArrayList and IntStream.range decompose well
LInkedLIst and Stream.iterate() decompose poorly – could even run out of memory
HashSet and TreeSet decompose in between

Other tips

Avoiding autoboxing also saves time. iterate() creates boxed objects where range() creates primitives
Parallel streams perform better where order doesn’t atter. findAny() or unordered().limit() [he missed the terminal operation in the limit example]
Avoid shared state
If have multiple calls to sequential() and parallel(), the last one wins and takes effect for the entire stream pipeline

My take:
Good discussion of performance and things to be beware of. My blog wasn’t live becase I couldn’t get internet in the room. I typed it live though! A couple typos like findFist() but nothing signficiant

Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky

Java/J2EE Software Development and Technology Discussion Blog

JavaOne – Make your CPU cores sweat with Parallel Streams

Leave a Reply

Share this:

Leave a Reply