Speaker: Rod Johnson (@springrod)
See the DevNexus live blog table of contents for more posts
General
- Personal assistant approaches don’t work in the enterprise
- Hype can be distracting
- AI conversation driven by people not interested in/don’t understand the enterprise
- Some things work; some don’t.
- Change is fast. ex: Clause over last 6 months
Personal assistants
- Personal assistants use cases work best
- Coding assistants are a type of personal assistant
- Valuable because you are at computer and say yes/no
- This doesn’t walk in enterprise. Business process/chatbot with public can’t do take backs
- Broad, flexible, human in the loop, tolerance for error, chat oriented. (By contrast business processes need to be specific, predictable, automated, reliable, workflow oriented)
Claude Code Execution Process
- Analyze Request
- Create to do list
- Work through tasks
- Test each step
- Ensure integration
Unavoidable Challenges
- Non deterministic
- Hallucinations
- Prompt engineering is alchemy. Throw things in vs engineering
- Slow and expensive to run at scale
- Difficult to test and validate
Avoidable Challenges
- Top down mandates
- “AI all the things”/AI for the sake of AI – should be doing incrementally
- Wrong people controlling AI strategy. Data science group doesn’t always understand the business
- Greenfield fallacy – business systems/workflows already exist. Domain context exists.
- “This time is different” – no matter how shiny a new technology is; doesn’t change everything.
Instructive Open Claw Problems
- Lack of structure – relies on markdown.
- Token bloat and very high cost
- Needs to compress context frequently which can change meaning/introduce risk of errors
- Unpredictable especially as context grows
- Lack of explainability
- Exposed infrastructure risk article – egads. This is scary!
How to succeed
- Attack non determinism – make as predictable as possible by breaking complex tasks into small steps; smaller prompts, less tools in context, mix code in for some steps, create guardrails, (Also saves money because some steps can use a cheaper LLM)
- Integrate with what works – connect to existing system, leverage current domain expertise/coding skills, build on proven infrastructure, incremental
- Build structure to LLM Interactions – don’t talk English if can avoid; include as much structure as possible. Ask for format of structured data.
Testing
- Unit testing can find that you sent the wrong prompt (or implemented wrong)
- Integration testing – test with real LLM but fake data – ex: test containers
Domain Integrated Context Engineering (DICE)
- Context engineering more broad than prompt engineering
- Bridges LLM/business system
- Helps structure input and output
- Domain objects
- Integrate with existing domain models
- Structure is a continuum from Open Claw (autonomous/unstructured) to old fashioned code. In between is Claude, MCP, agent frameworks and deterministic planning.
- Embabel is the agent framework/deterministic planning level
What do as Java developers
- Gen AI works best alongside existing systems
- Your data/domain models/business rules
- AI should extend your capabilities not replace them.
- Think integration, not greenfield
- Java skills undervalued to this point
- Every Java developer should know both Java and Python [I do; yay]
Python vs Java
- Don’t just imitate Python approaches
- Build better – look at prior art (Python), leverage domain experience, apply architecture experience, bring strengths to Gen AI, create better frameworks, lead
- Python – great for data science (data science != gen ai), scripting, prototyping
- JVM – excels at enterprise grade applications
Embabel
- Directly addresses key Gen AI failure points
- Key innovation is deterministic planning [Python frameworks do not do this]
- Goal Oriented Action Planning (GOAP)
- Predictable/explainable execution
- Actions and goals create extensible system
- Includes a server; knows what up to.
- Knows about all deployed capabilities and can extend
- Builds up understanding of domain
- Will become AI fabric of enterprise
- Framework written in Kotlin; put a lot of effort into making sure easy to use from Java.
- Most examples in Java and most of users/community are Java
- Builds on existing stack.
Unfolding Tools
- While better to have samller steps with less tools, sometimes you need a lot of tools
- Tools use a lot of context and can confuse the LLM
- Unfolding saves tokens and improves accuracy
- Exposes a single top level tools. When invoked it expands to show children. Like Russian nesting dolls
- Works by rewriting message history within agentic loop
Agentic Tools
- Like supervisor pattern in Python framework, but more deterministic
- Eposes single top level tool that coordinates lower level tools
- Advanced implementations allow controlling order
RAG
- Currently pipeline RAG. Do query , no feedback, hard to adjust
- Future is agentic RAG – context aware multi step search with self-correction. LLM has more autonomy. Can do more searches: text, vector, expand chucks, etc
Rod wrote blog post: You can build better AI Agents in Java than Python
My take
After hearing about one shotting and exaggerations on social media, having a more balanced take was great. I especially appreciated the *whys*. I also liked the “what can you do” to use AI more safety problem.