[QCon 2019] Maximizing Performance with GraalVM

Thomas Wuerthinger 

For other QCon blog posts, see QCon live blog table of contents

Tradeoff between what factors optimized

  • Startup time
  • Peak throughput
  • Memory footprint
  • Maximizing request latency
  • Packaging size (matters for mobile)
  • Can usually optimize a few (but not all) of these

GraalVM

  • Supports JVM languages, Rubby, Python, C, Rust, R etc
  • Can embed in node js, oracle database
  • Standalone binary
  • Community Edition and Enterprise Edition
  • Can run with Open JDK using Graal JIT compiler or AOT (ahead of time compiling)

AOT

  • To use, create new binary with pre-compiled code
  • Package classes from app, libraries used and part of the VM
  • Iterate adding things until know what need. Then create native executable.
  • Uses an order of magnitude less memory than JIT. Saving memory helps when running on AWS Lambda
  • CPU usage a lot less up front. Small peak at startup
  • JIT compiler has profiling feedback so can do better in the long run. AOT has PGO (profile guided optimizations) to deal with this
  • Working on improving – collecting profiles up front, low latency GC option and tracing agent to facilitate configuration

Performance

  • Startup time (from start until first request can be served). Two orders of magnitude faster with AOT
  • Starting up in less than 50 milliseconds allows spinning up new process upon request
  • Hard to measure. Can be lucky/unlucky when get data.
  • JIT has an advantage for peak performance. It has profiling data and can make optimistic assumptions. If the assumption not true, can de-optimize/bail out of optimization.

Benchmarks

  • Benchmarks are good. Should have more
  • Optimizing on too few benchmarks is like overfitting on machine learning
  • http://renaissance.dev/ – benchmark suite. Includes Scala and less commonly tested

Choosing

  • GraalVM JIT – when need peak throughput, max latency and no config
  • GraalVM AOT – use when need fast startup time, small memory footprint and small packaging size

Recommends reading top 10 things to do with GraalVM

Q&A

  • Have you considered using Epsilon in benchmark? Not yet. Makes sense since doesn’t do any GC
  • Why not use parallel GC? Not sure if it would make a difference. Kirk noted would avoid allocation hit over G1.
  • Does AOT make sense for large heaps? Can make sure don’t have disadvantage at least.

My impressions

I had heard about Graal and forgotten a lot. I re-learned much. I like the list of steps slides and the diagram. I feel like it will be more memorable this time. I also liked the comparison at the end on impact of the dimensions covered up front.

[QCon 2019] The Trouble With Memory

Kirk Pepperdine

For other QCon blog posts, see QCon live blog table of contents

General

  • Slow database queries, inefficient app code and too many database queries are most reported problems
  • Once drill down, over 70% of all Java apps are bottlenecked on memory churn. It’s not reported because hard to observe
  • Tend to put logging around past problems.
  • If apply instrument to a system, it will always tell you something. And then you act on it
  • Cheapar to predict than react

Common libraries

  • Logback
  • Marshalling Json, SQL
  • Caching products
  • Hibernate

Memory

  • Java heap has generations
  • Hopefully people have moved to G1GC
  • Everything happens in the free list

Problems

  • Large number of temporary objects quickly fills Eden
  • Causes frequent young cycles. Causes premature promotion which means will go to tenured too early
  • Heap becomes more fragment
  • Allocation is quick. No cost to collect if objects die quickly. However, still slow if you do something quick enough times.
  • Large live data set size. Data consistently live in your heap. Increases time to copy/compact. Likely have less space to copy to. Think about Windows defragmenter. [Do people still have to do that?]
  • Memory leak from unstable live data. JVM will terminate if you are lucky.
  • Out of memory – 98% of recent time spent in GC with less than 2% of heap recovered. If don’t meet that criteria, app is just really slow, but don’t get the out of memory error.

Escape analysis

  • Test applied to a piece of data. What is the visibility/scope.
  • If scoped locally, only thread that created it can see it.
  • If passed to method, partial escape.
  • If data scoped so multiple threads can see it (Ex: static), full escape.

Demo

  • Showed GC log. Want to see low pause times
  • Showed allocation rates. Problem if too high
  • In Visual VM, looked at profiler. Check filters to ensure not filtering the bottleneck out of your profile
  • Sort by # allocated objects to see frequency. It doesn’t take longer to allocate a large object than a small one.
  • Take a snapshot and look at trace
  • “Stop thinking” – explore what is shown without assuming
  • Time to look at the code from the stack trace that is creating all the objects
  • Escape analysis code
  • Run jitwatch to see allocations. Can see if direct/inline allocation. Can see when bytecode eliminates an allocation
  • Profiler is lying to you.
  • Performance differs in test vs prod environment

Q&A

  • How know the performance problem is the int[] in the demo? Went through profiler to show stack trace. Used BigInteger which uses up a lot more memory than a long
  • Absolute number for GC allocation rate? Sparc? Number seem to hold regardless of hardware. Should focus on the CPU going forward.
  • <missed question> – try to find mutable state that is not shared

My impression

This was great. I learned a lot and it kept my attention. I really liked the demo.

[QCon 2019] Java Futures

Brian Goetz

For other QCon blog posts, see QCon live blog table of contents

General

  • Java is approaching middle age. Almost 25 years old
  • Keep promises to users
  • Prime directive is compatibility
  • Backward compatibility matters ex: generics. Don’t need to recompile old code
  • Patterns ex: single method interfaces for lambdas rather than having to rewrite libraries
  • Languages features are forever. Interacts with others; even future ones
  • Waited 10 years for generics until had right story/timing. Knew copying C++ was the wrong choice
  • No language is ever finished
  • Languages are never good enough because hardware changes, new problems, developer expectations change
  • mid-2019 edition because things change so fast

Cadence

  • Used to release based on a feature rather than a data
  • Often didn’t feel worthwhile to do small feature because got stuck behind big ones
  • Now doing about two years of six month release schedule.
  • Release management overhead went down to almost zero
  • Same rate of innovation; just changed rate of release
  • Java 13 already in rampdown. Released in September.
  • Already working on Java 14

Preview feature

  • Risk of things happening too quickly since features are forever
  • Preview means feature is done but not finalized
  • Not experimental/beta.
  • Think of as provisional feature.
  • Expected outcome is that will be promoted to real feature in next version or two
  • Full IDE/Tooling support for preview features
  • Need to turn them on so not accidentally using in production -enable-preview flag

Current initiatives

  • Amber – right sizing language ceremony. Includes local variable type inference and future changes like pattern matching
  • Valhalla – adapt form modern hardware, value types, generic specialization
  • Loom – Fibers
  • Panama – Native code and data

Local Variable type inference

  • In Java 10. Future if on Java 8. Infinite past if you are Brian :).
  • var instead of type for local variable
  • Not syntactic sugar. Can expose “hidden” type (ex: capture types, intersection types and anonymous class types). So see more generics.
  • One of most commonly requested features
  • But also significant/vocal angst – can write bad code; giving into fashion
  • Not controversial once release. Fear of change in advance?
  • Will take time for good practices to emerge
  • Style guide https://openjdk.java.net/projects/amber/LVTIstyle.html

Switch enhancements

  • Preview feature in Java 12 and 13
  • Significant fraction of switch statements want to be expression. SO have to assign in each case.
  • Break is annoying. Irritating and error prone
  • Looked at how switch statements needs to evolve for pattern matching and then made more generally useful.
  • Boilerplate bad because a place for bugs to hide
  • Two changes – can use switch as expression or statement. Streamlined syntax label -> consequence
  • Will be in Java 14 unless feedback said made some horrible error

Multi-line String literals

  • Preview feature in Java 13
  • Require quotes and concatenation. Error prone for JSON, SQL and HTML
  • Manual mangling introduces errors
  • Was going to be a preview feature in Java 12. WIthdrawn because had better idea of how to do it.
  • Triple quotes to start/end
  • Dots for indentation that is just for IDE so don’t get extra whitespace when reformat in IDE.

Pattern matching

  • Intend to deliver in phases; hopefully starting in Java 14
  • Phase 1
    • Replaces if statement for instanceof and then the cast for the same
    • “Not only does the language make you cast explicitly, but it gives you the change to get it wrong”
    • if (a instanceof Integer intvalue) – checks type and gives new variable can use as type safely
    • Simplifies equals method implementation
  • Phase 2
    • Use in switch
    • case Integer i – matches if Integer, bind to i and allow use of variable inside the case.

Records

  • Lots of boilerplate – constructor, accessors, Object methods
  • A lot of code to read only to find out didn’t need to read any of it.
  • IDE generates boilerplate, but it doesn’t help you read it.
  • record Point(int x, int y) {}
  • Like enum, give up some extra features to get functionality.
  • Get sensible defaults in record
  • Declaratively states “I am simply a carrier for my data”
  • Programmer making a commitment
  • [does this discourage adding instance methods?]
  • Product type of algebraic data type

Sealed types

  • Sum type of algebraic data types
  • Shape = Circle + Rect
  • Compiler will know only those types
  • sealed interface Shape {}
  • record Circle(Point center, int radius) implements Shape {}
  • record Rect(Point x,….
  • Can use in if statement and pattern matching to get fields of record as local variables
  • Can use in switch statement. Compiler will complain if don’t address all possible implementations of sealed types
  • ex: case Circle(Var center, var r) -> PI * r * r;

Project Valhalla

  • Aims to reboot the layout of data in memory
  • Hardware change a lot in past 25 years.
  • Cost of memory fetch vs arithmetic has increased heavily in last 25 years
  • An array of objects will use data all over so many cache misses
  • Big change because goes down so far.
  • We want to be able to specify the class should be inlined
  • Giving up immutability and representation polymorphism.
  • Getting hardware friendly data layout
  • Code for the class works like an int

My Impressions

Great talk! It covered both things I had seen and things I hadn’t. Great perspective. Excited for some of the coming features.