[QCon 2019] Are we really cloud native?

Bert Ertman

For other QCon blog posts, see QCon live blog table of contents

Cloud Computing

  • Not new
  • Market growing fast/analysts on rise
  • “Java EE is dead, long live the Cloud” – cloud coming at expensive of Spring, etc
  • “There is no cloud. it’s just someone else’s computer” – 5 years ago was just virtualization elsewhere. No longer does it justice
  • Evolution – IaaS -> PaaS -> Serverless
  • Serverless is the evolution of virtualization or compute
  • Re-imagine middleware or higher level services as managed services that can call via an API
  • Cloud native is the step after serverless

Mapping

  • Business agility – Microservices
  • Infrastructure = CI/CD + containers
  • Process = Agile + DevOps

Evolution and problems

  • 80-90% of IT budgets are spent on maintaining existing systems
  • Experiment with new tech/process comes out of time left
  • Don’t save money by simply moving the app server to the cloud. Often costs more.
  • Then tried spring boot with a fat jar which turned into an inverted app server
  • Adding Docker makes it more portable but doesn’t actually use benefits of cloud
  • Next tried microservices in Docker. Waste more resources because need more virtual machines. Introducing problems while solving other problems. Modularity is good and microservices are a modularity tool. However adding cost due to network/config/dependencies/versioning/etc
  • Next tried Kubernetes. Everyone shouldn’t have to run/manage in prod
  • Agile adoption took a few years because needed business buy in. DevOps isn’t just learning tools. DINO (devops in name only)
  • Cloud native is a dev ops journey. Continuous journey with new services and components. Services can be short lived. Think about managing a mix of software and infrastructure and scale
  • Get to a mix of serverless and non-serverless services.
  • Technologies or frameworks are not cloud-native, it is the way you use them

Other Benefits

  • Economic disruption – startup costs low. Don’t need datacenter staff
  • Easily experiment with new tech or new business ideas
  • Faster time to market

Tips or challenges

  • Use managed services where possible
  • IT is not just a cost center; need strategy
  • Business needs to trust IT

Java

  • GraalVM and compiling to native code facilitates writing serverless/lambda. Solves cold start problem
  • If Java is your only skill, you are in for a hard time
  • With DevOps, there are new problems you need to be knowledgeable
  • Cloud Engineer needs to know more than just a programming language. Flowchart: https://github.com/kamranahmedse/developer-roadmap/blob/master/readme.md

Q&A

  • OSS advice? OSS Community bundling products to help with direction of cloud native. Try to use provider supplied services where possible.
  • Stats on whether spend less in serverless? Maybe. Definitely war stories from real enterprises

My impressions

Bert got a lot of laughs which is good. It means the audience is engaged. It’s a good perspective and I like the path/journey he took to get there.

[QCon 2019] Maximizing Performance with GraalVM

Thomas Wuerthinger 

For other QCon blog posts, see QCon live blog table of contents

Tradeoff between what factors optimized

  • Startup time
  • Peak throughput
  • Memory footprint
  • Maximizing request latency
  • Packaging size (matters for mobile)
  • Can usually optimize a few (but not all) of these

GraalVM

  • Supports JVM languages, Rubby, Python, C, Rust, R etc
  • Can embed in node js, oracle database
  • Standalone binary
  • Community Edition and Enterprise Edition
  • Can run with Open JDK using Graal JIT compiler or AOT (ahead of time compiling)

AOT

  • To use, create new binary with pre-compiled code
  • Package classes from app, libraries used and part of the VM
  • Iterate adding things until know what need. Then create native executable.
  • Uses an order of magnitude less memory than JIT. Saving memory helps when running on AWS Lambda
  • CPU usage a lot less up front. Small peak at startup
  • JIT compiler has profiling feedback so can do better in the long run. AOT has PGO (profile guided optimizations) to deal with this
  • Working on improving – collecting profiles up front, low latency GC option and tracing agent to facilitate configuration

Performance

  • Startup time (from start until first request can be served). Two orders of magnitude faster with AOT
  • Starting up in less than 50 milliseconds allows spinning up new process upon request
  • Hard to measure. Can be lucky/unlucky when get data.
  • JIT has an advantage for peak performance. It has profiling data and can make optimistic assumptions. If the assumption not true, can de-optimize/bail out of optimization.

Benchmarks

  • Benchmarks are good. Should have more
  • Optimizing on too few benchmarks is like overfitting on machine learning
  • http://renaissance.dev/ – benchmark suite. Includes Scala and less commonly tested

Choosing

  • GraalVM JIT – when need peak throughput, max latency and no config
  • GraalVM AOT – use when need fast startup time, small memory footprint and small packaging size

Recommends reading top 10 things to do with GraalVM

Q&A

  • Have you considered using Epsilon in benchmark? Not yet. Makes sense since doesn’t do any GC
  • Why not use parallel GC? Not sure if it would make a difference. Kirk noted would avoid allocation hit over G1.
  • Does AOT make sense for large heaps? Can make sure don’t have disadvantage at least.

My impressions

I had heard about Graal and forgotten a lot. I re-learned much. I like the list of steps slides and the diagram. I feel like it will be more memorable this time. I also liked the comparison at the end on impact of the dimensions covered up front.

[QCon 2019] The Trouble With Memory

Kirk Pepperdine

For other QCon blog posts, see QCon live blog table of contents

General

  • Slow database queries, inefficient app code and too many database queries are most reported problems
  • Once drill down, over 70% of all Java apps are bottlenecked on memory churn. It’s not reported because hard to observe
  • Tend to put logging around past problems.
  • If apply instrument to a system, it will always tell you something. And then you act on it
  • Cheapar to predict than react

Common libraries

  • Logback
  • Marshalling Json, SQL
  • Caching products
  • Hibernate

Memory

  • Java heap has generations
  • Hopefully people have moved to G1GC
  • Everything happens in the free list

Problems

  • Large number of temporary objects quickly fills Eden
  • Causes frequent young cycles. Causes premature promotion which means will go to tenured too early
  • Heap becomes more fragment
  • Allocation is quick. No cost to collect if objects die quickly. However, still slow if you do something quick enough times.
  • Large live data set size. Data consistently live in your heap. Increases time to copy/compact. Likely have less space to copy to. Think about Windows defragmenter. [Do people still have to do that?]
  • Memory leak from unstable live data. JVM will terminate if you are lucky.
  • Out of memory – 98% of recent time spent in GC with less than 2% of heap recovered. If don’t meet that criteria, app is just really slow, but don’t get the out of memory error.

Escape analysis

  • Test applied to a piece of data. What is the visibility/scope.
  • If scoped locally, only thread that created it can see it.
  • If passed to method, partial escape.
  • If data scoped so multiple threads can see it (Ex: static), full escape.

Demo

  • Showed GC log. Want to see low pause times
  • Showed allocation rates. Problem if too high
  • In Visual VM, looked at profiler. Check filters to ensure not filtering the bottleneck out of your profile
  • Sort by # allocated objects to see frequency. It doesn’t take longer to allocate a large object than a small one.
  • Take a snapshot and look at trace
  • “Stop thinking” – explore what is shown without assuming
  • Time to look at the code from the stack trace that is creating all the objects
  • Escape analysis code
  • Run jitwatch to see allocations. Can see if direct/inline allocation. Can see when bytecode eliminates an allocation
  • Profiler is lying to you.
  • Performance differs in test vs prod environment

Q&A

  • How know the performance problem is the int[] in the demo? Went through profiler to show stack trace. Used BigInteger which uses up a lot more memory than a long
  • Absolute number for GC allocation rate? Sparc? Number seem to hold regardless of hardware. Should focus on the CPU going forward.
  • <missed question> – try to find mutable state that is not shared

My impression

This was great. I learned a lot and it kept my attention. I really liked the demo.