[2023 kcdc] the elephant in your data set – avoid bias in machine learning

Speaker: Michelle Frost

For more, see the table of contents.


Notes

  • Intersectionality wheel of privileged. Many spokes and range from power to erased to marginalized. Used the version posted here
  • Bias – inclination or prejudice for or against one person or group
  • ML Bias – systematic error in the model itself due to assumptions
  • Sometimes bias is necessary – inductive bias – assumptions combined with training examples to classify
  • Models with high bias oversimplify the model
  • Each stage has potential harmful bias
  • Bias feeds back into model
  • In ML, when something looks two good to be true, it probably is

Points of bias

  • Historical – prejudice in world as it exists today. Gave example from ChatGPT where assumed a nurse was female even when replaced pronouns. Full example here
  • Representation bias – Sample under-represents part of population. Can’t make effective predictions for that group. Article describing. “Solved” by dropping gorillas as a label
  • Measurement bias – using a proxy to represent a construct. Problem if oversimplifying or accuracy varies across groups. Compas (Correctional Offender Management Profiling for Alternative Sanctions) example. Data measures policing not just the offender.
  • Aggregation bias – one size fits all model assumes mapping inputs to labels is consistent. For example, could mean something different across cultures. Such as LSD being Lake Shore Drive in Chicago and not a drug. Or racial differences for HbA1c
  • Learning bias – modeling choice may prioritize one objective which damages another. Such as Amazon’s recruiting tool discriminating against women
  • Evaluation bias – benchmark data does not represent the population. Might make sense in some scenarios. Project Gender Shades analyzed differences in different tools.
  • Deployment bias – model attended to solve one problem, but used a different way. Make a hook for tuna and use it on a shark. Child abuse protection tool fails poor families.

Simpson’s paradox

  • Other attributes are a proxy for the thing leaving out
  • Association disappears, reappears or reverses when divide population

Terms

  • Protected class – category where bias is relevant
  • Sensitive characteristics – algorithmic decisions where bias could be factor
  • Disparate treatment
  • Disparate outcome/impact
  • Fairness – area of research to ensure biases and model inaccuracies do not lead to models that treat individuals unfavorable due to sensitive characteristics.

Metrics

  • Demographic partiy – decisions/outcomes independent of protected attribute. Does not protect all unfairness
  • Equal odds – decision independent of protected attributes. True and false positive rates must be equal
  • Equal opportunity – like equal odds but only measures fairness for true positive rates

Demo

  • A popular (bad) data set is “adult data set”. I think i this one.
  • Not balanced by gender, race, country

Book recommendations

  • Weapons of math destruction
  • Biased
  • The alignment Ppoblem
  • Invisible Women
  • The Big Nine
  • Automating Inequality

My take

The types of bias and examples were interesting. Good end to the day. The demo graphs provided the point about biased data nicely.

[2023 kcdc] With Great Power Comes Great Responsibility: The Ethics of AI

Speaker: Matthew Renze

Twitter: @matthewrenze

For more, see the table of contents.


History

Tech has a tendency to be abused

  • land – slaves
  • mechanized war fare – expand influence
  • cyberware – mass surveillance

Alice and Bob

  • Need to decide if want to get cat or dog for kids.
  • One researches cats and one dogts.
  • Get into info bubble thinking cat lovers hate dogs and vice versa and mad at each other
  • Then talk to real people, learn people like both and get a cat and a dog.
  • A generation later they lose their jobs due to robots/AI. Their kids see lots of jobs because tech savvy.
  • Kids convince parents to upskill and get new job
  • Another generation later grandkids want biological augmentation and to marry an AI.
  • Feel lost in world no longer recognize
  • Learn about technology and see it is an evolution. Learn from grandchildren.

Today

  • When search for something, get more of it.
  • Then info bubble/echo chambers
  • Goal is to maximize engagement. This results in more extreme content so people click
  • Lose privacy – ex: shopping data predict pregenancy
  • Can deanonymomize data with data of birth, sez and zip code
  • Little privacy now and soon a lot less
  • Algorithmic bias – ex: racially bias criminal risk score, males preferred in resumes

AI

  • Uncanny valley – distrust things that almost like us
  • Hallucination – making up believeable, but false info
  • Misinformation at scale
  • Lack of AI literacy

What can we do

  • Delete cookies
  • Incognito mode
  • Throwaway emails
  • Stop using “click holes” to get pulled down rabbit holes
  • Opt out
  • Privacy regulations
  • Limit/stop using social media
  • Talk to other people

AI Developers

  • Eliminate bias in data – diverse datasets, exclude protected attributes, retrain algorithm over time
  • Be able to explain how AI made decision. Use decision tree vs neural network where can.
  • Let users choose how much error they allow
  • Don’t allow full autonomous

Fight misinformation

  • Who is the author/publisher?
  • What are their sources?
  • How strong is the evidence?

Near Future

  • Significant unemployment – simple/repetitive/costly jobs. Expect 20%+ jobs to go away by 20230 and be replaced by other higher tech jobs
  • Labor market unprepared for rapid change
  • Society is unprepared for change.
  • Many people left behind in poverty.
  • Synthetic media – indistinguishable from human data. Propaganda/misinformation at scale. Deep fakes. Deep nude (remove clothes without permission), etc
  • With 10 likes, AI knows you well as colleague.
  • Surveillance capitalism – can’t detect being manipulated
  • Greater social stratification – income gap
  • Safety issues – does self driving car protect driver or pedestrian
  • Autonomous weapons – currently a human is in the loop

Solutions

  • Educate everyone/AI literacy, Basics of ML, DL (deep learning), RL (reinforcement learning)
  • Job retraining
  • Retirement options for those too old to reskill
  • Mandatory higher ed – mandatory high school was controversial
  • Universal basic income/negative income task
  • Deep fake detection – arms race
  • Digital alibi – so can prove what doing at all times and therefore not in fake ideo
  • Blockchain for everything so have complete audit trail
  • Default mode of skepticism

Further Future – Speculative

  • AGI (artificial general intelligence) – at least as smart as average person
  • Improve health
  • Solve biggest problem – climate change, politics, government
  • Humans could become obsolete – ex: horses became obsolete to farms. “Peak horse” was in 1915
  • Collapse of modern institutions – could break capitialism.
  • Changes already faster than society can adapt. What happens when new discoveries every day?
  • Dystopian future – authoritarianism, communism, fascism, AI religion, AI super bureaucracy
  • Or a better AI based government
  • ASI (artificial super intelligence) – if create AGI, intelligence exposion can happen fast. AGI can rewrite its own code.
  • Alignment problem – how do we align human and AI values. Reward hacking – find loopholes
  • AI run amok – what happens if robot mine astroids. When does it stop
  • Conflicts – are we pets, ants, raw materials, competition, a threat?

Positives

  • We evolved for short bursts of stress.
  • Modern society is chronic stress
  • Be mindful with tech
  • Respect AI
  • Don’t fear/fight change
  • Use tech when beneficial and skip when not
  • Reward AI goal states
  • Keep ability to intervene if decision doesn’t align

Long run

  • Peacefully coexist with AI
  • AI wins
  • AI and humanity merge – most likely option
  • Humanity ends itself

Merge

  • No “us vs them” problem.
  • Phones an extension of us
  • Younger generation willing to merge with mind
  • VR/AR glasses
  • Gene editing
  • Brain/computer interfaces
  • Next version of people likely to be vary different

My take

The Alice and Bob stories are fun. There was a ton of information. It went very fast and definitely need time to process. I expected more discussion of ethics rather than covering “everything” but I’m happy with how it turned out.

[2023 kcdc] rescuing your git repo using amend, reset, revert, rebase, bisect and cherry picking

Speaker: Brian Gorman

Twitter @blgorman

Repo with all commands

For more, see the table of contents.


Note: The GitHub repo is excellent and has all the instructions/commands. I did not try to recreate them in my blog. Instead I focused on the concepts

Branching strategies

  • Git Flow – main > dev > feature > developer. Good if just starting out. Not doing a lot of rebasing
  • Trunk based – no long running branches, frequent checkins. More popular due to CICD
  • Forking – integration repo, lieutenants and dictators. Good in super large orgs. More advanced
  • While branching strategy doesn’t matter, does matter if linear commit history. (Some operations are trickier if non-linear)

Rebase and Force Push

  • Rebase locally (based on remote or local branch)
  • Can have orphaned commits
  • Force pushing with a lease makes it safer
  • May have to deal with conflicts on a rebase
  • Use pull request; don’t create an extra merge commit
  • Important to delete old branches to avoid confusion

Finding lost commits

  • Can use GitViz (on WIndows only?) to look at graphically – https://github.com/Readify/GitViz
  • git reflog –all
  • git checkout <id> – puts in detached HEAD state to look at it. See double parens around commit id.

Clear local cache

  • Unlikely to need. Cleans up state
  • git reflog expire –expire-unreachable-now –all – expire all commits now
  • git gc –prune – run garbage collection

Removing feature

  • Not a problem if use feature flags
  • Create a branch to keep safe the parts not changing
  • Reset branch to last commit want to keep
  • Create new feature branch and pick commits want

Accidentally committed to main

  • Stop build as quickly as possible
  • Let team know not to change or pull from main
  • Create feature branch and cherry pick commits want
  • Reset main hard. git push –force-with-lease
  • Revert change to keep history
  • Change settings on repo so can’t commit to main again :).
  • (if can’t do this, can revert instead of changing history)

Someone committed a secret

  • If only a local commit, delete .git and start over. If already pushed…
  • If don’t need history, create new repo without history. If can’t….
  • Stop all dev as doing massive history update
  • Ensure all code checked in
  • Use git bisect to find the first commit containing the secret (start, good id, bad id, then you keep saying if a commit is good/bad). Alternatively git log -S “secret” gives you the commit
  • Ensure no branches are dependent on commit after the last good commit
  • Amend commit with one that doesn’t have the secret, Then cherry pick the rest
  • Everyone has to get the repo again since commits have changed

My take

I really like the mix of concepts, visualizations and videos of actually using the functionality. Great session.