[QCon 2019] Maximizing Performance with GraalVM

Thomas Wuerthinger

For other QCon blog posts, see QCon live blog table of contents

Tradeoff between what factors optimized

Startup time
Peak throughput
Memory footprint
Maximizing request latency
Packaging size (matters for mobile)
Can usually optimize a few (but not all) of these

GraalVM

Supports JVM languages, Rubby, Python, C, Rust, R etc
Can embed in node js, oracle database
Standalone binary
Community Edition and Enterprise Edition
Can run with Open JDK using Graal JIT compiler or AOT (ahead of time compiling)

AOT

To use, create new binary with pre-compiled code
Package classes from app, libraries used and part of the VM
Iterate adding things until know what need. Then create native executable.
Uses an order of magnitude less memory than JIT. Saving memory helps when running on AWS Lambda
CPU usage a lot less up front. Small peak at startup
JIT compiler has profiling feedback so can do better in the long run. AOT has PGO (profile guided optimizations) to deal with this
Working on improving – collecting profiles up front, low latency GC option and tracing agent to facilitate configuration

Performance

Startup time (from start until first request can be served). Two orders of magnitude faster with AOT
Starting up in less than 50 milliseconds allows spinning up new process upon request
Hard to measure. Can be lucky/unlucky when get data.
JIT has an advantage for peak performance. It has profiling data and can make optimistic assumptions. If the assumption not true, can de-optimize/bail out of optimization.

Benchmarks

Benchmarks are good. Should have more
Optimizing on too few benchmarks is like overfitting on machine learning
http://renaissance.dev/ – benchmark suite. Includes Scala and less commonly tested

Choosing

GraalVM JIT – when need peak throughput, max latency and no config
GraalVM AOT – use when need fast startup time, small memory footprint and small packaging size

Recommends reading top 10 things to do with GraalVM

Q&A

Have you considered using Epsilon in benchmark? Not yet. Makes sense since doesn’t do any GC
Why not use parallel GC? Not sure if it would make a difference. Kirk noted would avoid allocation hit over G1.
Does AOT make sense for large heaps? Can make sure don’t have disadvantage at least.

My impressions

I had heard about Graal and forgotten a lot. I re-learned much. I like the list of steps slides and the diagram. I feel like it will be more memorable this time. I also liked the comparison at the end on impact of the dimensions covered up front.

Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky

Java/J2EE Software Development and Technology Discussion Blog

Leave a Reply

Share this:

Leave a Reply