JavaOne – Make your CPU cores sweat with Parallel Streams

“Make your CPU cores sweat with Parallel Streams”

Speaker: Lukasz Pater

For more blog posts from JavaOne, see the table of contents

Started with the canonical Person/Car/age example to show what streams are [I think everyone at Javaone knows that]

Then showed code that finds all the prims under 10 milion that are palindromes when expressed in binary. Moores law is now relying on multi-core architecture. This example takes 1.6 seconds vs 8.6 seconds as parallel improvement. Five times faster.

Good analogy: If told to wash the fork, then the knife, then the… it is slow. If told to wash the dishes, you can optimize internally.

History of threads

  • JDK 1 – Threads – still good for small background task
  • JDK 5 – ExecutorService, concurrent objects
  • JDK 7 – fork/join framework – recursively decompose tasks into subtasks and then combine results
  • JDK 8 – parallel streams – use fork/join framework and Spliterator behind the scenes

Making a parallel stream

  • parallelStream() – for source
  • parallel() – anywhere in stream pipeline intermediate operation list

Fork Join Pool

  • Uses work stealing to balance tasks amongst workers in pool
  • All parallel streams use one common pool instance with # threads = # CPU cores – 1. That final thread is for the master to assign work.
  • Can change by setting system property java.util.concurrent.ForkJoinPool.common.parallelism. Must pass on commands line because first call to parallel stream resets it if set in code
  • If want custom fork join pool, create one and submit your stream to it. Does not recommend doing this. One reason you might want to is to add a timeout to the stream


  • Avoid IO – burn CPU cycles waiting for IO/network
  • Use only for CPU intensive tasks
  • Be careful with nested paralle streams
  • Having many smaller tasks in the pool will better balance the workload
  • Don’t create your own fork join pool


  • splitable iterator
  • to traverse elements of a source in parallel
  • tryAdvance(Consumer) – do something if an element exists
  • trySplit() – partition off some elements to another spliterator leaving less elements in the original – fork so have tree of spliterators until run out of elements
  • characteristics()
  • estimateSize()
  •, true) – creates parallel stream from spliterator – shouldn’t need to do this
  • ArrayList decomposes into equal sizes. LinkedList gives a smaller % of the elements because linear to get elements and want to minimize wait time
  • ArrayList and IntStream.range decompose well
  • LInkedLIst and Stream.iterate() decompose poorly – could even run out of memory
  • HashSet and TreeSet decompose in between

Other tips

  • Avoiding autoboxing also saves time. iterate() creates boxed objects where range() creates primitives
  • Parallel streams perform better where order doesn’t atter. findAny() or unordered().limit() [he missed the terminal operation in the limit example]
  • Avoid shared state
  • If have multiple calls to sequential() and parallel(), the last one wins and takes effect for the entire stream pipeline

My take:
Good discussion of performance and things to be beware of. My blog wasn’t live becase I couldn’t get internet in the room. I typed it live though! A couple typos like findFist() but nothing signficiant

Leave a Reply

Your email address will not be published.