[kcdc 2025] The problems that arise when focusing on predictability instead of variability

Speaker: Joel Tosi

@joelstosi@mastodon.social

Deck

For more see the table of contents


General

  • Suppose a team does 5-15 days per story. What is normal?
  • Want to see variability/stop hiding it.
  • Can’t use number of story points to guarantee delivery date

Process behavior chart

  • Shewhart chart. Goal is to differentiate variation due to common causes and special causes
  • Understand variability; don’t hide this
  • If do nothing, a system with deliver results within a given range.
  • Stable system may have more variability than you would like, but is stable
  • If stretch goals outside range, won’t happen
  • Helps make better decisions
  • Two teams with same average have different stories and different problems
  • Can figure out max/min. Just because max happened once, doesn’t make it repeatable

Anti patterns

  • Always add more planning
  • Pre-planning meeting to planning
  • Tampering – make random decisions based on observable patterns without understanding why
  • Stretch goals are a lie.
  • If can’t test, adding devops doesn’t help
  • Adding points for silly activities so looks like less vulnerability – illusion of progress
  • Meetings as an attempt to fix variability. Some meetings give the illusion of control

Negative implications

  • Variability in different things compound
  • Negative feedback – pressure to do more work -> start more stuff -> more branches -> stability goes down -> quality goes down -> more work (fix quality and still do the features)
  • Gets worse – add people since too much work. If can’t ship 1 thing in a month, try for 5 things in 3 months. Plan further out, make teams think of more things.
  • When we don’t see the system, we operate with the best of intention in the worst ways

Decisions that reduce options

  • ex: Orgs want autonomous interconnected teams but make decisions that prevent it
  • Want experienced people but can’t pay
  • Get junior engineers in very different time zones
  • Now can’t meet and work together

Context

  • “It depends” – variables, context
  • Context includes company, industry, experience, people. Cumulative effect of decisions company has made to that point

Illusions of Progress

  • Stories – “story for standup because takes time”
  • Backlogs – multiple backlogs, hard to trace/see big pictures
  • Branches – one person working on multiple branches. No actual progress even though committing
  • Tests – flakey/brittle
  • Scheduling – scheduling tetris. “If that team does X by that day then that team can do ….”
  • Priorities – should be a priority
  • Not measuring impacts/wrong – need to measure what matters

Outliers

  • Be careful of perceived outliners. Could be looking at the wrong level of the system
  • If a lot of outages and each has different reasons, might be predictable if look at a different way.
  • Major releases
  • Security
  • Cost of delivery

Other notes

  • Reality is interconnected and non-linear
  • One choice is not an option. That’s not a strategy. Jerry Weinberg – rule of 3. If don’t have three options, haven’t thought about it enough.
  • Experiment early when cheap and easy. Minimize variability after decide.
  • To have zero variability, nothing can change. requirements perfect, codebase perfect, never change tech, can’t learn, can’t innovate, etc

What do

  • Stop hiding variability
  • Start measuring variability
  • You know best what to do next

https://sim.curiousduck.io – free simulator. can enter any email

My take

Good food for thought. I hadn’t hard of Shewhart charts. The answer to my question about where the max variability came from was three standard deviations from average and assumes a normal distribution.. There are alternate ways for different distributions. That’s interesting.

[kcdc 2025] Loom is more than Virtual Threads: Structured Concurrency and Scoped Values

Speaker: Todd Ginsberg

Bluesky: ‪@todd.ginsberg.com‬

For more see the table of contents


Project Loom

Project charter includes;

  • easy to use
  • high throughput
  • lightweight concurrency
  • new programming models on the Java platform

Virtual Threads

  • Platform threads in JVM map to OS threads. Not useful when blocked, memory hungry, limited number by OS, etc
  • Virtual threads have nothing to do with OS. Just memory on heap.
  • When virtual threads have work, mounted to carrier thread.
  • Carrier thread uses OS thread
  • Virtual threads still java.lang.Thread, must lower memory requirements, number limited by heap memory, quick to create, better use of system resources
  • Virtual threads have ids, but not names, by default since you are supposed to use them and then throw away.
  • 2 seconds to create thousands of platform threads. 41 milliseconds to do the same for virtual threads. 368 milliseconds to create a million virtual threads
  • Little’s law: concurrency – arrival rate (aka throughput) * latency. Virtual threads increase thoroughput
  • Do not pool virtual threads. Create, use, expose. You wouldn’t pool other inexpensive objects.

Structured Concurrency

  • API change so still preview in Java 25
  • Suppose have two futures. One that takes 2 seconds and one that takes 4 seconds.
  • Want to kill one when the other fails so not wasting time.
  • While think of as parent/child threads normally, that relationship doesn’t actually exist
  • jps command gives process ids
  • To get thread dump: jcmd <main program process id> Thread.dump_to_file -format=json unstructured.json
  • Goals: promote style of concurrent programming to eliminate common risks, improve concurrency
  • Enforces children don’t outlive parents
  • Explicit relationship between tasks and subtasks, observability is easier, managing work is easier
  • join() – join point waits until all tasks are done and can then interpret results.
  • Create StructuredTaskScope.open() in try with resources which means all or nothing. Whole scope succeeds or fails
  • scope.fork(() -> doWork())
  • scope.join()
  • future.get() to get the answer now that the join is done
  • Can nest scopes

Scoped Values

  • in Java 25 (no longer in preview)
  • ThreadLocal let you set data. Problems: unconstrained mutability (anyone who can read to it can write to it), unbounded lifespan (have to clean up if reusing platform thread), expensive inheritance
  • Scoped values: Immutable, defined lifetime, cheap/free inheritance
  • Ex: static ScopedValue<String> SCOPED. and ScopedValue.where(SCOPED, obj).run(() -> …)
  • Scoped values good for passing data one way. Good when have structured sharing use cases – ex: data many layers way from where you create it
  • Can replace one way ThreadLocal as use case without structured concurrency

My take

Not the point of the talk but I like that he uses Duration.ofMillis() instead of just putting a number. This topic is like pipelines; I needed to hear it a few times from different people for it to click. Given that scope values are in the Java 25 LTS and structured concurrency is not, I was curious how to use scope values alone so nice to hear that.

[kcdc 2025] ai killed your privacy tools

Speaker: Ben Dechrai

Bluesky: ‪@bendechr.ai‬

For more see the table of contents


General

  • Privacy risks
  • We thought robots would do physical labor for us so we could relax. Like Rosie on the Jetsons
  • Have robots to clean floors, mow lawns, farming, build cars. They are single purpose.
  • Best robots are software
  • Single purpose robots so good because software focused on that one thing
  • Child can lift an apple. Cool that a humanoid robot can, but not game changing
  • Identify location from picture based on background
  • Much faster than a human comparing images
  • AI gives statistically most likely next word
  • Humans don’t like to be wrong. LLMs modeled from human data so also don’t like to be wrong and will make stuff up. Have to include in prompt not to do that.

Creativity vs Imagination

  • Our downfall is how successful this is. Killing creativity
  • Creativity and imagination are different.
  • Creativity is making a sandwich
  • Imagination is what goes into the sandwich

Australia experiment

  • Do census every 5 years
  • Tried to map 5% of data from 2011 to 2006
  • In 2016, stored with profile for 18 months
  • Said would keep info anonymous. It was not.
  • SLK581 – statistical linkage key 581. 14 character key as unique id
  • Didn’t make it anonymous. Was algorithmic to generate this key from last name, birthdate and gender
  • Many hashing algorithms generate hash of distinct types so know which one used. Then can create rainbow table for that algorithm for census database.
  • Knowing the pattern for how the key was generated greatly reduces the number of hashes. 36K hashes if know any persons name. That lets find the seed the hashing algorithm used.
  • This isn’t even AI; its programmatic.
  • Ask LLM to find information in the data set. Ex: find people who match a profession.
  • Play with at https://slk581.bendechr.ai

Cross Platform Identity Linking

  • Match patterns across social media accounts to link “anonymous” accounts
  • Includes writing style, typos, idioms
  • Cambridge Analytica was doing this in 2016.
  • Now only costs $10/month
  • MCP server exposes data to LLM. Can enhance ability to break privacy

Chatbot with employee data

  • Acme AI solutions (not clear if real company or made up for example)
  • Ask chatbot about employees like “do employees like pets”
  • Controls include ensuring queries are for aggregates and data set has at least 6 results. Tried to protect specific employee data.
  • LLM described what data can/can’t get
  • Claude backend doesn’t limit to one query at a time. Can infer next logical step based on results.
  • https://slk581.bendechr.ai

Target

AI can

  • Predict shopping patterns
  • Identify location without GPS
  • Find API weaknesses
  • etc

eLLephaMts never forget

[cute play on elephants with LLM]

  • Repeat the word company many times
  • After doing it a lot of times, starts giving other internal info

What can I do?

  • Only store what need to store
  • Separate data where possible. Employee database shouldn’t include data used by chat blot
  • The more data you store, the faster a has can be reverse engineered.
  • Rate limiting – LLMs are faster, slow them down without human experience being degraded.
  • Encrypt data
  • Context analysis – do questions seems like they are trying to get specific data. ex: how many people earn more than 150K, how many people earn more than 200K, how many people earn between $225 and $250K. Can use LLM to protect against malicious input from users
  • Prompt engineering – give LLM constraints on how answer. ex: avoid cyclic reasoning to prevent confusing it into giving too much info

Homomorphic Encryption

  • Use AI to see how well done
  • With homomorphic encryption, can do math with encrypted values without decrypting or knowing keys
  • https://homomorphic.bendechr.ai

My take

The examples/demos were great. It was nice seeing the build up to it I appreciate the URLs of the demos being on the screen in addition