java @ speed: making the most of modern hardware – live blogging from qcon

Java @Speed – Making the most of modern hardware
Speaker: Gil Tene
See the list of all blog posts from the conference

duct tape engineering should only be done when absolutely necessary

We think of speed as a number. But it’s not a quality without a context. Are you fast when you deploy? When at peak load? When the market opens? When acually trade? How long can you be fast in a row?

In Java, speed starts slow when app starts and gets faster until gets to steady point. Because the code changes over time. It starts out purely interpretted then optimizes after profiling. Also, GC pauses.

Modern servers

  • Number cores/chip has tripled
  • Instruction window keeps increasing
  • More parallelism each generation
  • Cache also increasing


  • Can reorder code
  • Can remove dead code – nobody knowsif it ran the code. So can say did it; just really fast.
  • Values can be propagated – remove temporary variables
  • Can remove redundant code
  • Reads canbe cached – as if you extracted a variable. Use volatile if needs to avoid
  • Writes can be eliminated – can save calculation if doesn’t change
  • Can inline method call
  • Also does clever tricks lie checking for nulls only after SEGV happens. If you turn out to throw a lot of null pointers, deoptiizes to add guard clause
  • Class Hierarchy Analysis (CHA) – looks at whole code base for optimizations
  • Inlining works without final because knows no subclass. If a new subclass shows up, deoptimizes at that time.
  • If think only have one subclass, add guard clause and optimize. The guard clause will unoptimize
  • Deoptimizations create slowdown spikes in performance even during the optimized phase. Warmup isn’t always enough because warm up code might not hit all scenarios. “The one thing you haven’t done is trade.” So the first real trade is slow because it is deoptimization.
  • Azul has a product that logs optimizations and re-loads them on startup from prior runs.

Microbenchmarking is hard because some things are optimized away (like basic math). Use jmh from OpenJDK to microbenchmark, but still suspect everything.

I like that he showed the assembly code and explained the relationship to a simple for loop.

Leave a Reply

Your email address will not be published. Required fields are marked *