mutation testing from a five year old (according to chatgpt)

I asked ChatGPT to explain mutation testing to a five year old to see what it would say. I got this back:

Alright, imagine you have a big puzzle made of Lego blocks. Each block represents a piece of code that makes up a computer program. Now, let’s say we want to make sure that our puzzle is put together correctly so that our program works properly.

Mutation testing is like having a mischievous Lego wizard who comes and changes some of the pieces in our puzzle to see if we notice. He might swap a blue block for a red one, or turn a block upside down. After he’s done, we carefully check our puzzle again to see if it still looks right and if our program still works.

If we can still solve the puzzle and our program still works even with the changes the wizard made, then we know our puzzle – or our program – is really strong. But if we can’t solve the puzzle or our program stops working, then we know there’s a problem and we need to fix it. So, mutation testing helps us make sure our programs are really robust and can handle unexpected changes.

It’s a good example of how ChatGPT is an aide and not a solver. First paragraph is good. In the second part, I love the mischievous Lego wizard who comes and changes some of the pieces to see if we notice. It’s a good premise. We see if our program still works with the changes.

The third paragraph leads us astray. If we notice the change, that is good. If the program stops working it is also good, means we noticed the change. Mutation testing is about having good tests that can detect change. Not a robust program with unnecessary code that behaves the same way when changed.

[uberconf 2023] Practical AI Tools for Java Developers

Speaker: Ken Kousen

@kenkousen

For more, see the table of contents


Prompt Engineering

  • Tools are improving fast, might not be needed as job
  • Suggest context (ex: “pretend you are”)
  • Give example of what you want

Chat GPT

  • Free version is GPT 3.5 Turbo (improved performance over original 3.5)
  • $20/month for GPT 4. Can make 25 requests in a three hour block.
  • Have not noticed quality control over plugins.
  • Plugins change rapidly.
  • Apologizes when you correct it.
  • Warning about pasting your company’s code in
  • Trained thru summer 2021
  • Can’t read files on local file system (Bard can). Can read link but doesn’t know it can
  • Often wrong if you ask it about whether can do something. Like talking to toddler; says want thinks want to know.
  • Temperature – tweaks creativity vs precision
  • REST API docs
  • REST API: cookbook has examples
  • Must give credit card to call REST APIs. Pennies are for 1000 tokens (about 750 words). Charged for both input and output words. Also limits on context (amount GPT remembers). Not expensive if don’t use it much. Ken’s bill has been pennies and too low to be sent a bill.
  • REST API JSON response says how many tokens used. Can also see graph when log into account
  • Had it make multiple choice questions on a topic

Chat GPT Code Interpreter

  • Code Interpreter beta feature.
  • Need to explicitly enable under settings.
  • From OpenAI, not third party
  • ex: can convert Groovy to Kotlin DSL for Gradle

DALL-E

  • First popular text to image generation tool
  • A generation behind text/GPT.
  • Stable Diffusion free, but behind on quality
  • Prefers MidJourney, more realistic

Whisper

  • Audio to text
  • Takes audio or video and writes transcription.
  • Free (unless use REST API)
  • Mac Whisper – $20 on time fee for larger models. Good for transcribing videos of talks. Slow first time. After that (including other videos, fast. [caching?]
  • Creates .srt file (Subtitles)

Claude.AI

  • Free beta
  • Only available in US and UK
  • Can hold 100K tokens. ex: can summarize a novel
  • Quality comparable to ChatGPT 3.5, but not as good as 4.0
  • Can upload many file types
  • Harder to get back to previous conversations than ChatGPT. Need to click on “A” icon on top to see them
  • Doesn’t do image

Bard

  • Can upload answers to Google docs on Ken’s personal account, but not business account
  • Used to be able to answer who Venkat is but can’t anymore.

Llama 2

  • Meta announced today
  • Pretrained language model
  • Free unless large company (aka: competitors)

Descript

  • Transcribes and edits video
  • Can give instructions – ex: shorten gaps in video, remove filler words
  • If don’t move around much, will make it look like you are looking at camera
  • Can give text and select a voice. With 30 minute sample, can train on your voice

Canva

  • Can describe presentation want and Canva makes a draft
  • Can choose theme from list of choices
  • Magic eraser – brush over part of image don’t want and replaces with background nearby
  • Beats Sync – line of slide transition to beats of music
  • Magic Write – like GPT 3.5
  • Magic Design – give own image and make presentation around that

GitHub Copilot

  • Virtual pair programmer
  • Plugins for VSCode and IntelliJ
  • If hesitate, suggests code
  • Can’t agree to part of suggestion. Need to accept it all or delete
  • Guesses right a lot because knows what have done before in a training class
  • Always looks plausible because trained on own code. Need to look carefully
  • Next generation is GitHub CopilotX. Only available via wait list. VS Code only at this point, can use for pull requests.
  • GitHub Next – tools in a variety of states – https://githubnext.com. “Visualizing a Codebase” runs as github action to see packages

IntelliJ AI Assistant

  • Not much documentation on how it works. Only one blog post
  • In Ultimate, not Community
  • In beta edition
  • Can highlight code and ask to explain it
  • If don’t like suggestion, can request it suggests something else and get more choices
  • Can write commit message for you
  • Find issues with code when know language well
  • Helps in language know less well because it knows the API/syntax
  • Good for nuisance tasks that would take a lot of time

YouTube Summary

  • Get summary or transcript of video
  • Free
  • Up to 20 minute video

My take

I was doing my interview with the Build Propulsion Lab so was a few minutes late. It was a full room so my seat was on the floor. Luckily, the room had a large aisle so I could sit near the front instead of in the very back! And the carpet was comfy.

As far as Ken’s actual talk, it was great. I liked the overview of a bunch of tools and seeing the REST APIs for calling OpenAI. Great breath of topics and fun examples! I learned a lot including some tools I hadn’t heard of. And some very cool functionality!

[2023 kcdc] chatgpt: don’t take my job, help me thrive in it

Speaker: Steve Odell

For more, see the table of contents.


Timeline

  • 1940 – enigma
  • 1964 – first chat bot
  • mid 2022 – GitHub co-pilot came out
  • Then ChatGPT 3.5
  • Panic about AI taking all our jobs

Survey

  • Most people in room used ChatGPT
  • A few used Bard
  • A good number use GitHub Co-Pilot

1969

  • Had ChatGPT write a story about ATMs rendering bank tellers obsolete
  • It was well written
  • Talked about roles evolving
  • Also covered analogy to ChatGPT and talking about enhancing capabilities

Takeaways

  • Not going to take our jobs
  • Can let you down just as much as it impresses you
  • Do not take at face value
  • Often apologizes when wrong and wrong a lot

Examples

  • Lawyer used ChatGPT which made up cases. Used real case numbers but unrelated – https://simonwillison.net/2023/May/27/lawyer-chatgpt/
  • Asked for a C# function to calculate the points in a bridge hand. Gave it the rules in a prompt and a description about the notation. Quickly provided code that looks reasonable on first glance. When tested code, got wrong answer – 18 points, for a 20 point hand. ChaptGPT also wrote a bulleted list explaining logic and got 20 points in explanation, but not code. Realizes messed up and explains why wrong in a way that conflicts with the explanation.
  • Succeeded at codegolf – rewriting code in less lines.
  • Tried to get to write infrastructure as code. First gave approach to set up cloud formation for a high level description of what want for AWS. Did good job listing AWS services need and short description of each. Then asked to create the cloud formation templates listing services. Gave a stub of the yaml leaving out all the hard parts. Ex # VPC properties. Then tried one at a time and didn’t tie them together..
  • On the next example for an OAUTH workflow in Maui, ChatGPT just said can’t do it and provided a basic login page which was nothing like what asked for. Thinks not enough code as training data. New and lot of code is internal to companies.’
  • Repeated example in Reactive Native. Didn’t test, but looks much better; includes OAUTH workflow and expected parts.

Prompt engineering

  • Some companies are hiring prompt engineers
  • Skill set we should all learn
  • Tried getting SQL for a recipe app. Asked for table with create table scripts listing fields want and more about each. Did good job on keys and not null constraints. Unit of measure was vararg rather than numeric. Did right when asked for a units of measure table.
  • Chaining prompts in the same discussion gets to where want.

My take

Standing room only crowd. I got there very early (because I needed to leave at the 30 minute mark) and was barely able to get an aisle seat. [I misread the calendar and have a work presentation at 11am eastern].

The first half of the presentation was excellent. The examples were clear and run. Gave an excellent sense of the current state of AI. The beginnings of the prompt engineering section was great as well. I wish I could have stayed for the rest.