java evolution of eclipse collections – live blogging from qcon

The Java Evolution of Eclipse Collections
Speaker: Kristen O’Leary
See the list of all blog posts from the conference

Eclipse Collections

  • was once GS (Goldman Sachs) Collections
  • Memory efficient collections framework
  • Open sourced in 2012

Java 8

  • 8.0 compatible with Java 8+
  • Extend Java 8 Functional Interfaces
  • New APIs – ex: reduceInPlace

Optional

  • RichIteratable.detectWith() used to rutn null if no match
  • Pre Java-8 could use detectIfNone() to create if doesn’t exist
  • New method detectWithOptional() which returns Optional wrapper

Collectors

  • Collectors2 has collectors
  • Can collect into Bag, ImmutableSet, BiMap, Stack, etc
  • Have full primitive collections library
  • Have full set of multi-map (map with multiple values for same key ex: key to list/set)
  • Have extra APIs like chunk() and zip()

Default methods

  • RichIterable is common interface so default methods helpful.
  • Used to add reduceInPlace() so don’t need to stream and create new collection
  • Also useful for asLazy() or toImutabile() since Eclipse Collections doesn’t provide stream() there.

Primitive Collections

  • Code generate the primitive classes so symmetry
  • Showed impressive memry savings – Eclipse Collections and Trove equivalent. Much smaller than with autoboxing and faster too
  • LazyIterable availalbe for all 8 primitive types. Just call asLazy(). A LazyIterabe can be reused unlike a stream.

Java 9

  • Module system
  • Internal API Encapsulation
  • Need to change APIs that use reflection in order to build. Can’t call setAccessible(true) any more.
  • There is a command line argument to ignore reflection errors. Don’t want to impose that on callers

I like that Kristen uses kata examples to show hw the APIs work.

applying java 8 idioms to existing code – trisha gee – qcon

For more QCon posts, see my live blog table of contents. Last year, she talked about creating new apps in Java 8. This year is about migrating to Java 8 for “legacy” code. She showed specific JetBrains IDEA features. I just took notes on the concepts that are IDE agnostic. Presentation will be at bit.ly/refJ8

Usage from audience poll

  • About half compile with Java 7
  • About half compile with Java 8
  • About a quarter (half of Java 8 people)  compile with Java 8, but apps still looks like Java 7

Why use Java 8

  • Performance improvements in common data structure
  • Fork/Join speed improvements
  • Changes to support concurrency
  • Fewer lines of code – less boilerplate
  • New solutions to problems – more elegant/easier to write code
  • Minimize errors – ex: Optional avoids NullPointerException

Before refactoring

  • Make sure have high test coverage
  • Focus on method level changes – tests should still work
  • Have performance tests too to ensure not negatively impacting system
  • Decide on goals – performance? clearer code? easier to write new code? easier to read new code? use lambdas because team used to it? developer morale?
  • Limit the scope – small incremental commits. Don’t make change that will affect whole code base.

Straightforward refactoring of lambda expressions

  • Predicate, Comparator and Runnable are common places where might have used anonymous inner classes and can switch to lambdas
  • IDEs suggest where can replace code with lambdas (SonarLint does as well)
  • Quick win – reduce old boilerplate
  • [She didn’t talk about this, but it will make your code coverage go down. Make sure your management doesn’t worry about that!]
  • Lose explicit type information when use lambda. Might want to extra to method to make it clearer and call the method from the lambda.

Abstract classes – candidates for lambdas

  • See if can use an interface (if single abstract method) so can start using lambdas
  • Tag with @FunctionalInterface to make clearer
  • Then look at implementers (especially anonymous inner classes) and see if can convert to lambda expressions

Conditional logging

  • if (debug) { log.debug() } pattern is a good candidate for logging
  • Often people forget to add that conditional so opportunity to defer evaluation for those too via search/replace. Also, protects against conditional not matching log level. if (debug)  { log.severe() }. Uh oh!
  • Use a default method on the existing logging interface that takes a supplier instead of a String and has the if check

Collections and Streams API

  • Turn for loops into collect() or forEach()
  • Remember you can call list.forEach() directly without going through a stream
  • If do something like get rid of initial list size, test performance before and after to make sure not a needed optimization
  • The automatic streams refactoring might not result in shorter code. Think about whether the surrounding code wants to be refactored as well. A more dramatic refactoring. Go back to later. Also, avoid refactoring if and else code since not  a simple filter.
  • Remember to include method references when converting code too
  • Watch for surrounding code. For example, if sorting a list, move into the the stream. Or if iterating over new list, add as a forEach instead of turning into a list.

Optional

  • Be careful. Refactor locally.
  • Easy to accidentally change something that requires changing a ton of code.

Performance

  • Lambdas expressions don’t necessarily perform better than anonymous inner classes. Similarly for streams.
  • Using lambdas to add conditional logging is a big quick win performance improvement.
  • Streams didn’t need the “initial size of list” optimization
  • Adding parallel() to stream operations sometimes performs better and sometimes worse. Simple operation often performs worse in parallel because of overhead of parallelizing.
  • Introducing Optionals increases cost.

Java 7 refactorings

  • Use static imports
  • Reudndant type in diamond operator for generics

java 8 stream performance – maurice naftalin – qcon

This is part of my live blogging from QCon 2015. See my QCon table of contents for other posts.

See http://www.lambdafaq.org

Background
He started with background on streams. (This is old news by now, but still taking some notes). The goals were to bring a functional style to Java and “explicit but unobtrusive” hardware parallelism. The former is more important than performance.

The intention is to replace loops with aggregate operations. [I like that he picked an example that required three operations and not an oversimplified example]. More concise/readable. Easy to change to parllelize.

Reduction == terminal operation == sink

Performance Notes
Free lunch is over. Chips don’t magically get faster over time. Intead, add core. The goal of parallel streamsisfor the intermediate operations in parallel and then bringing them together in reduction.

What to measure?

  • We want to know how code changes affect system performance in prod. Not feasible though because would need to do a controlled eperiment in prod conditions. Instead, we do a controlled experiment in lab conditions and hope not answering a simplified question.
  • Hard to microbenchmark because of inaccuracy, garbage collection, optimization over time, etc. There are benchmarking libraries – Caliper or JMH. [or better if don’t need to microbenchmark]
  • Don’t optimize code if don’t have a problem. What’s your performance requirement? [and is it the bottleneck]. Similarly don’t optimize the OS or the problem lies somewhere else.

Case study
This was a live demo. First we saw that not using BufferedReader makes a file slow to read. [not about streams]. Then we watched my JMeter didn’t work on the first try. [the danger of a live demo]. Then he showed how messing with the GC size and making it too small is bad for performance as well [still not on streams]. He is trying to shw the process of perofrmance tuning overall. Which is valid info. Just not what I expected this session to be about.

Then [after I didn’t see the stream logic being a problem in th first plae], he showe how to solve subproblems and merge them.[oddly not calling it map reduce]

8 minutes before the end of the talk, we finally see the non-parallel code for the case study. It’s interesting code becauase it uses two terminal operations and two streams. At least reading in the file is done normally. Finally, we see that the combiner is O(n) which prevents speeding it up.

Some rules

  • The workload of the intermedidate operations must be great enough to outweith the overheads. Often quoted as size of data set * processing cost per element
  • sorted() is worse
  • Collectors cost extra. toMap*( merging maps is slow. toList, toSet() is dominated by the accumulator.
  • In the real world, the fork/join pool doesn’t operate in isolation

My impressions: A large amount of this presentation wasn’t stream performance. Then the case study shows that reading without a BufferedReader is slow. [no kidding]. I feel like the example was contrived and we “learned” that poorly written code behaves poorly. I was hopingthe talk would actually be about parallelization. When parallelStream() saves time and when it doesn’t for example. What I learned was for this particular scenario, parallelization wasn’t helpful. And then right at the end, the generic rules. Which felt rushed and thown at us.