[devnexus 2026] it’s up to java developers to fix enterprise ai

Speaker: Rod Johnson (@springrod)

See the DevNexus live blog table of contents for more posts


General

  • Personal assistant approaches don’t work in the enterprise
  • Hype can be distracting
  • AI conversation driven by people not interested in/don’t understand the enterprise
  • Some things work; some don’t.
  • Change is fast. ex: Clause over last 6 months

Personal assistants

  • Personal assistants use cases work best
  • Coding assistants are a type of personal assistant
  • Valuable because you are at computer and say yes/no
  • This doesn’t walk in enterprise. Business process/chatbot with public can’t do take backs
  • Broad, flexible, human in the loop, tolerance for error, chat oriented. (By contrast business processes need to be specific, predictable, automated, reliable, workflow oriented)

Claude Code Execution Process

  • Analyze Request
  • Create to do list
  • Work through tasks
  • Test each step
  • Ensure integration

Unavoidable Challenges

  • Non deterministic
  • Hallucinations
  • Prompt engineering is alchemy. Throw things in vs engineering
  • Slow and expensive to run at scale
  • Difficult to test and validate

Avoidable Challenges

  • Top down mandates
  • “AI all the things”/AI for the sake of AI – should be doing incrementally
  • Wrong people controlling AI strategy. Data science group doesn’t always understand the business
  • Greenfield fallacy – business systems/workflows already exist. Domain context exists.
  • “This time is different” – no matter how shiny a new technology is; doesn’t change everything.

Instructive Open Claw Problems

  • Lack of structure – relies on markdown.
  • Token bloat and very high cost
  • Needs to compress context frequently which can change meaning/introduce risk of errors
  • Unpredictable especially as context grows
  • Lack of explainability
  • Exposed infrastructure risk article – egads. This is scary!

How to succeed

  • Attack non determinism – make as predictable as possible by breaking complex tasks into small steps; smaller prompts, less tools in context, mix code in for some steps, create guardrails, (Also saves money because some steps can use a cheaper LLM)
  • Integrate with what works – connect to existing system, leverage current domain expertise/coding skills, build on proven infrastructure, incremental
  • Build structure to LLM Interactions – don’t talk English if can avoid; include as much structure as possible. Ask for format of structured data.

Testing

  • Unit testing can find that you sent the wrong prompt (or implemented wrong)
  • Integration testing – test with real LLM but fake data – ex: test containers

Domain Integrated Context Engineering (DICE)

  • Context engineering more broad than prompt engineering
  • Bridges LLM/business system
  • Helps structure input and output
  • Domain objects
  • Integrate with existing domain models
  • Structure is a continuum from Open Claw (autonomous/unstructured) to old fashioned code. In between is Claude, MCP, agent frameworks and deterministic planning.
  • Embabel is the agent framework/deterministic planning level

What do as Java developers

  • Gen AI works best alongside existing systems
  • Your data/domain models/business rules
  • AI should extend your capabilities not replace them.
  • Think integration, not greenfield
  • Java skills undervalued to this point
  • Every Java developer should know both Java and Python [I do; yay]

Python vs Java

  • Don’t just imitate Python approaches
  • Build better – look at prior art (Python), leverage domain experience, apply architecture experience, bring strengths to Gen AI, create better frameworks, lead
  • Python – great for data science (data science != gen ai), scripting, prototyping
  • JVM – excels at enterprise grade applications

Embabel

  • Directly addresses key Gen AI failure points
  • Key innovation is deterministic planning [Python frameworks do not do this]
  • Goal Oriented Action Planning (GOAP)
  • Predictable/explainable execution
  • Actions and goals create extensible system
  • Includes a server; knows what up to.
  • Knows about all deployed capabilities and can extend
  • Builds up understanding of domain
  • Will become AI fabric of enterprise
  • Framework written in Kotlin; put a lot of effort into making sure easy to use from Java.
  • Most examples in Java and most of users/community are Java
  • Builds on existing stack.

Unfolding Tools

  • While better to have samller steps with less tools, sometimes you need a lot of tools
  • Tools use a lot of context and can confuse the LLM
  • Unfolding saves tokens and improves accuracy
  • Exposes a single top level tools. When invoked it expands to show children. Like Russian nesting dolls
  • Works by rewriting message history within agentic loop

Agentic Tools

  • Like supervisor pattern in Python framework, but more deterministic
  • Eposes single top level tool that coordinates lower level tools
  • Advanced implementations allow controlling order

RAG

  • Currently pipeline RAG. Do query , no feedback, hard to adjust
  • Future is agentic RAG – context aware multi step search with self-correction. LLM has more autonomy. Can do more searches: text, vector, expand chucks, etc

Rod wrote blog post: You can build better AI Agents in Java than Python

My take

After hearing about one shotting and exaggerations on social media, having a more balanced take was great. I especially appreciated the *whys*. I also liked the “what can you do” to use AI more safety problem.

PASSED! Jeanne’s Experience Taking the Oracle Cloud Infrastructure 2025 AI Foundations Associate

Today I took the Oracle Cloud Infrastructure AI Foundations Associate certification and passed with a score of 88%. Passing is 65%.

It’s a 60 minute exam with 40 questions. It look me way less than that (about 10 minutes). Each question is pick one of four multiple choice questions. In many questions one or two were clear distractors.

Why I took this certification

Oracle is doing a race to certification, where you can take a number of free certifications between now and Halloween. Unlike the Vector cert, which I took solely because it was free, this one I took both because it was free and to learn something. And I did. Some was new to me and some I used to know and forgot/

What I did:

  • Watched videos and did skills checks from the free course. This was interesting. The skills check questions cover a good amount of the exam materials. I watched it on 2x speed. I also skipped most of the lab videos. I skipped a lot of the demos and focused on the concepts.
  • Watched the video about preparing for the exam. It came with 7 practice questions which were useful for getting a feel for what to expect.
  • Did practice exam. This was 40 questions in 60 minutes. it matched the real exam. The first time thru I got a 55%. (I didn’t review my notes at all) It was useful for knowing what I needed to remember. I took it immediately again and got a 93% which was more reflective of my real exam score.

The exam

All the questions were single answer multiple choice. Unlike the Vector exam, you didn’t have to sign up for a slot in advance. It didn’t seem proctored although hard to be sure. You also get 15 attempts at passing. I only used one but it seems like the goal is for people to pass and not to have a stringent test.

You get two free exam attempts with the promo and this exam didn’t use either of them. Looks like it is always free

I wrote a separate blog post about the exam engine. The battery indicator wasn’t there so maybe that was about the proctor’s machine?

After the exam

You get a score report on the screen right after submitting and an email right away with the same score. It took a little under an hour to get the second email saying my certview was updated.

What I found most interesting

The refresher on terms was good for me. The types of neural networks and when you use each was mostly new to me and interesting. There were good examples throughout. I also enjoyed seeing the capabilities of some of the Oracle products at a high level.

How to Study

The learning path is sufficient to take the exam if you go through it more carefully.

mutation testing from a five year old (according to chatgpt)

I asked ChatGPT to explain mutation testing to a five year old to see what it would say. I got this back:

Alright, imagine you have a big puzzle made of Lego blocks. Each block represents a piece of code that makes up a computer program. Now, let’s say we want to make sure that our puzzle is put together correctly so that our program works properly.

Mutation testing is like having a mischievous Lego wizard who comes and changes some of the pieces in our puzzle to see if we notice. He might swap a blue block for a red one, or turn a block upside down. After he’s done, we carefully check our puzzle again to see if it still looks right and if our program still works.

If we can still solve the puzzle and our program still works even with the changes the wizard made, then we know our puzzle – or our program – is really strong. But if we can’t solve the puzzle or our program stops working, then we know there’s a problem and we need to fix it. So, mutation testing helps us make sure our programs are really robust and can handle unexpected changes.

It’s a good example of how ChatGPT is an aide and not a solver. First paragraph is good. In the second part, I love the mischievous Lego wizard who comes and changes some of the pieces to see if we notice. It’s a good premise. We see if our program still works with the changes.

The third paragraph leads us astray. If we notice the change, that is good. If the program stops working it is also good, means we noticed the change. Mutation testing is about having good tests that can detect change. Not a robust program with unnecessary code that behaves the same way when changed.