[2023 kcdc] the elephant in your data set – avoid bias in machine learning

Speaker: Michelle Frost

For more, see the table of contents.


Notes

  • Intersectionality wheel of privileged. Many spokes and range from power to erased to marginalized. Used the version posted here
  • Bias – inclination or prejudice for or against one person or group
  • ML Bias – systematic error in the model itself due to assumptions
  • Sometimes bias is necessary – inductive bias – assumptions combined with training examples to classify
  • Models with high bias oversimplify the model
  • Each stage has potential harmful bias
  • Bias feeds back into model
  • In ML, when something looks two good to be true, it probably is

Points of bias

  • Historical – prejudice in world as it exists today. Gave example from ChatGPT where assumed a nurse was female even when replaced pronouns. Full example here
  • Representation bias – Sample under-represents part of population. Can’t make effective predictions for that group. Article describing. “Solved” by dropping gorillas as a label
  • Measurement bias – using a proxy to represent a construct. Problem if oversimplifying or accuracy varies across groups. Compas (Correctional Offender Management Profiling for Alternative Sanctions) example. Data measures policing not just the offender.
  • Aggregation bias – one size fits all model assumes mapping inputs to labels is consistent. For example, could mean something different across cultures. Such as LSD being Lake Shore Drive in Chicago and not a drug. Or racial differences for HbA1c
  • Learning bias – modeling choice may prioritize one objective which damages another. Such as Amazon’s recruiting tool discriminating against women
  • Evaluation bias – benchmark data does not represent the population. Might make sense in some scenarios. Project Gender Shades analyzed differences in different tools.
  • Deployment bias – model attended to solve one problem, but used a different way. Make a hook for tuna and use it on a shark. Child abuse protection tool fails poor families.

Simpson’s paradox

  • Other attributes are a proxy for the thing leaving out
  • Association disappears, reappears or reverses when divide population

Terms

  • Protected class – category where bias is relevant
  • Sensitive characteristics – algorithmic decisions where bias could be factor
  • Disparate treatment
  • Disparate outcome/impact
  • Fairness – area of research to ensure biases and model inaccuracies do not lead to models that treat individuals unfavorable due to sensitive characteristics.

Metrics

  • Demographic partiy – decisions/outcomes independent of protected attribute. Does not protect all unfairness
  • Equal odds – decision independent of protected attributes. True and false positive rates must be equal
  • Equal opportunity – like equal odds but only measures fairness for true positive rates

Demo

  • A popular (bad) data set is “adult data set”. I think i this one.
  • Not balanced by gender, race, country

Book recommendations

  • Weapons of math destruction
  • Biased
  • The alignment Ppoblem
  • Invisible Women
  • The Big Nine
  • Automating Inequality

My take

The types of bias and examples were interesting. Good end to the day. The demo graphs provided the point about biased data nicely.

[2023 kcdc] With Great Power Comes Great Responsibility: The Ethics of AI

Speaker: Matthew Renze

Twitter: @matthewrenze

For more, see the table of contents.


History

Tech has a tendency to be abused

  • land – slaves
  • mechanized war fare – expand influence
  • cyberware – mass surveillance

Alice and Bob

  • Need to decide if want to get cat or dog for kids.
  • One researches cats and one dogts.
  • Get into info bubble thinking cat lovers hate dogs and vice versa and mad at each other
  • Then talk to real people, learn people like both and get a cat and a dog.
  • A generation later they lose their jobs due to robots/AI. Their kids see lots of jobs because tech savvy.
  • Kids convince parents to upskill and get new job
  • Another generation later grandkids want biological augmentation and to marry an AI.
  • Feel lost in world no longer recognize
  • Learn about technology and see it is an evolution. Learn from grandchildren.

Today

  • When search for something, get more of it.
  • Then info bubble/echo chambers
  • Goal is to maximize engagement. This results in more extreme content so people click
  • Lose privacy – ex: shopping data predict pregenancy
  • Can deanonymomize data with data of birth, sez and zip code
  • Little privacy now and soon a lot less
  • Algorithmic bias – ex: racially bias criminal risk score, males preferred in resumes

AI

  • Uncanny valley – distrust things that almost like us
  • Hallucination – making up believeable, but false info
  • Misinformation at scale
  • Lack of AI literacy

What can we do

  • Delete cookies
  • Incognito mode
  • Throwaway emails
  • Stop using “click holes” to get pulled down rabbit holes
  • Opt out
  • Privacy regulations
  • Limit/stop using social media
  • Talk to other people

AI Developers

  • Eliminate bias in data – diverse datasets, exclude protected attributes, retrain algorithm over time
  • Be able to explain how AI made decision. Use decision tree vs neural network where can.
  • Let users choose how much error they allow
  • Don’t allow full autonomous

Fight misinformation

  • Who is the author/publisher?
  • What are their sources?
  • How strong is the evidence?

Near Future

  • Significant unemployment – simple/repetitive/costly jobs. Expect 20%+ jobs to go away by 20230 and be replaced by other higher tech jobs
  • Labor market unprepared for rapid change
  • Society is unprepared for change.
  • Many people left behind in poverty.
  • Synthetic media – indistinguishable from human data. Propaganda/misinformation at scale. Deep fakes. Deep nude (remove clothes without permission), etc
  • With 10 likes, AI knows you well as colleague.
  • Surveillance capitalism – can’t detect being manipulated
  • Greater social stratification – income gap
  • Safety issues – does self driving car protect driver or pedestrian
  • Autonomous weapons – currently a human is in the loop

Solutions

  • Educate everyone/AI literacy, Basics of ML, DL (deep learning), RL (reinforcement learning)
  • Job retraining
  • Retirement options for those too old to reskill
  • Mandatory higher ed – mandatory high school was controversial
  • Universal basic income/negative income task
  • Deep fake detection – arms race
  • Digital alibi – so can prove what doing at all times and therefore not in fake ideo
  • Blockchain for everything so have complete audit trail
  • Default mode of skepticism

Further Future – Speculative

  • AGI (artificial general intelligence) – at least as smart as average person
  • Improve health
  • Solve biggest problem – climate change, politics, government
  • Humans could become obsolete – ex: horses became obsolete to farms. “Peak horse” was in 1915
  • Collapse of modern institutions – could break capitialism.
  • Changes already faster than society can adapt. What happens when new discoveries every day?
  • Dystopian future – authoritarianism, communism, fascism, AI religion, AI super bureaucracy
  • Or a better AI based government
  • ASI (artificial super intelligence) – if create AGI, intelligence exposion can happen fast. AGI can rewrite its own code.
  • Alignment problem – how do we align human and AI values. Reward hacking – find loopholes
  • AI run amok – what happens if robot mine astroids. When does it stop
  • Conflicts – are we pets, ants, raw materials, competition, a threat?

Positives

  • We evolved for short bursts of stress.
  • Modern society is chronic stress
  • Be mindful with tech
  • Respect AI
  • Don’t fear/fight change
  • Use tech when beneficial and skip when not
  • Reward AI goal states
  • Keep ability to intervene if decision doesn’t align

Long run

  • Peacefully coexist with AI
  • AI wins
  • AI and humanity merge – most likely option
  • Humanity ends itself

Merge

  • No “us vs them” problem.
  • Phones an extension of us
  • Younger generation willing to merge with mind
  • VR/AR glasses
  • Gene editing
  • Brain/computer interfaces
  • Next version of people likely to be vary different

My take

The Alice and Bob stories are fun. There was a ton of information. It went very fast and definitely need time to process. I expected more discussion of ethics rather than covering “everything” but I’m happy with how it turned out.

[devnexus 2022] meta-modern software architecture

Speaker: Neal Ford from thoughtworks

@neal4d

Link to table of contents

———————

Were architectures come from

  • Architecture is reactive
  • Someone starts doing something, then others do
  • Once a bunch doing, named (after the fact)
  • Reflection on how doing software development at the time
  • Once in an architecture, can watch how it grows and changes

Eras

  • Victorian – 1801-1900 – science, cassifying natural world
  • Modernism – 1890-1945 – industriial revolution, explosive growth of cities, abstract art, radio.
  • Post-modernism – 1946-1990 – push back about modernism, irony, questioning everything, television, Seinfield’s ”never hug, never learn”
  • Post-post modernism or metamodernism – 1990-present – internet
  • Naming things is hard. Not just in software. Modernism is bad choice of name because what would come next

Metamodern

  • In 1989, to find out Chicago weather would need to watch Weather Channel and wait for it to cycle around or go to library. Now pull it up onlie.
  • In 1989, could read a few books and know pretty much everything about wine. Now too much info and keep generating more.
  • Holism – view various systems as whole
  • Parks and recreating is first meta-modern show
  • Breaking bad – colo driven – yellow is safe and purple is bad
  • Return to sentimentaliitiy. Can’t live on ironism alone

Software architecture

  • microservices – one of most popular pages on Martin Fowler’s website. Say what it is and more importantly, what it isn’t.
  • First law of software architecture: ”Everything in software architecture is a tradeoff”. If you haven’t encountered yet, will be in the future
  • Reuse reduces complexity but comes with high coupling
  • Metamodern software architect needs to do tradeoff analysis. Ex: things that change slowly are good for reuse such as frameworks and OS
  • service mesh and sidecar pattern – orthogonal coupling

Books

  • ”Fundamentals of Software Architecture”
  • ”Software Architecture: the Hard Parts”
  • ”Data Mesh”

Forces

  • Consistency – atoic, eventual
  • Communication – sync, asych
  • Coordination – orchestrated, choreography
  • 8 possiblities by choosing one of each. ex: one is a monolith. All 8 can exist as pattern or antipattern.
  • Named them transactional sagas. epic, fantasy fiction, fairy tale, parallel, phone tag, horror story time travel, anthology

Richard Feinman

  • Computers used to be a room full of people (usually women) calculating things
  • Feyman added specialization and paralleliation. Some people are better at some tasks than others. And recovering from problems
  • 1945 – atomic bomb blast is what shifted eras
  • reonsider why continuing to do thing. revisit when reasons change

Internet

  • Pushed us to net era
  • Volkswagon used software to cheat on emissions test. Some people knew actively working to break the law
  • Facebook keeps getting busted for doing bad thigs – data breaches, illegally tracking users, Cambridge Analytica, using two factor for marketing.
  • Last week, Facebook made up a meme that TikTok that students slapping teachers. Then it became a self fulling prophacy

Finance and ethics

  • Modernism – double enry accounting
  • Post-modern – quants
  • Metamodern humane corporation, ethics. Recognize all connected to each other
  • Don’t want to create something cool and spening rest of career on appology tour
  • Apple, Google employees pushed back

My take

Fun start to the day. I hadn’t heard of the ”saga” approach before. Googling, at least some of them see to be a real thing. (and all are from ”the hard parts” book I also increased my book reading list. The end felt rushed. Maybe because started late?