[2023 kcdc] the elephant in your data set – avoid bias in machine learning

Posted on June 22, 2023 by Jeanne Boyarsky

Speaker: Michelle Frost

For more, see the table of contents.

Notes

Intersectionality wheel of privileged. Many spokes and range from power to erased to marginalized. Used the version posted here
Bias – inclination or prejudice for or against one person or group
ML Bias – systematic error in the model itself due to assumptions
Sometimes bias is necessary – inductive bias – assumptions combined with training examples to classify
Models with high bias oversimplify the model
Each stage has potential harmful bias
Bias feeds back into model
In ML, when something looks two good to be true, it probably is

Points of bias

Historical – prejudice in world as it exists today. Gave example from ChatGPT where assumed a nurse was female even when replaced pronouns. Full example here
Representation bias – Sample under-represents part of population. Can’t make effective predictions for that group. Article describing. “Solved” by dropping gorillas as a label
Measurement bias – using a proxy to represent a construct. Problem if oversimplifying or accuracy varies across groups. Compas (Correctional Offender Management Profiling for Alternative Sanctions) example. Data measures policing not just the offender.
Aggregation bias – one size fits all model assumes mapping inputs to labels is consistent. For example, could mean something different across cultures. Such as LSD being Lake Shore Drive in Chicago and not a drug. Or racial differences for HbA1c
Learning bias – modeling choice may prioritize one objective which damages another. Such as Amazon’s recruiting tool discriminating against women
Evaluation bias – benchmark data does not represent the population. Might make sense in some scenarios. Project Gender Shades analyzed differences in different tools.
Deployment bias – model attended to solve one problem, but used a different way. Make a hook for tuna and use it on a shark. Child abuse protection tool fails poor families.

Simpson’s paradox

Other attributes are a proxy for the thing leaving out
Association disappears, reappears or reverses when divide population

Terms

Protected class – category where bias is relevant
Sensitive characteristics – algorithmic decisions where bias could be factor
Disparate treatment
Disparate outcome/impact
Fairness – area of research to ensure biases and model inaccuracies do not lead to models that treat individuals unfavorable due to sensitive characteristics.

Metrics

Demographic partiy – decisions/outcomes independent of protected attribute. Does not protect all unfairness
Equal odds – decision independent of protected attributes. True and false positive rates must be equal
Equal opportunity – like equal odds but only measures fairness for true positive rates

Demo

A popular (bad) data set is “adult data set”. I think i this one.
Not balanced by gender, race, country

Book recommendations

Weapons of math destruction
Biased
The alignment Ppoblem
Invisible Women
The Big Nine
Automating Inequality

My take

The types of bias and examples were interesting. Good end to the day. The demo graphs provided the point about biased data nicely.

[2023 kcdc] With Great Power Comes Great Responsibility: The Ethics of AI

Posted on June 22, 2023 by Jeanne Boyarsky

Speaker: Matthew Renze

Twitter: @matthewrenze

For more, see the table of contents.

History

Tech has a tendency to be abused

land – slaves
mechanized war fare – expand influence
cyberware – mass surveillance

Alice and Bob

Need to decide if want to get cat or dog for kids.
One researches cats and one dogts.
Get into info bubble thinking cat lovers hate dogs and vice versa and mad at each other
Then talk to real people, learn people like both and get a cat and a dog.
A generation later they lose their jobs due to robots/AI. Their kids see lots of jobs because tech savvy.
Kids convince parents to upskill and get new job
Another generation later grandkids want biological augmentation and to marry an AI.
Feel lost in world no longer recognize
Learn about technology and see it is an evolution. Learn from grandchildren.

Today

When search for something, get more of it.
Then info bubble/echo chambers
Goal is to maximize engagement. This results in more extreme content so people click
Lose privacy – ex: shopping data predict pregenancy
Can deanonymomize data with data of birth, sez and zip code
Little privacy now and soon a lot less
Algorithmic bias – ex: racially bias criminal risk score, males preferred in resumes

AI

Uncanny valley – distrust things that almost like us
Hallucination – making up believeable, but false info
Misinformation at scale
Lack of AI literacy

What can we do

Delete cookies
Incognito mode
Throwaway emails
Stop using “click holes” to get pulled down rabbit holes
Opt out
Privacy regulations
Limit/stop using social media
Talk to other people

AI Developers

Eliminate bias in data – diverse datasets, exclude protected attributes, retrain algorithm over time
Be able to explain how AI made decision. Use decision tree vs neural network where can.
Let users choose how much error they allow
Don’t allow full autonomous

Fight misinformation

Who is the author/publisher?
What are their sources?
How strong is the evidence?

Near Future

Significant unemployment – simple/repetitive/costly jobs. Expect 20%+ jobs to go away by 20230 and be replaced by other higher tech jobs
Labor market unprepared for rapid change
Society is unprepared for change.
Many people left behind in poverty.
Synthetic media – indistinguishable from human data. Propaganda/misinformation at scale. Deep fakes. Deep nude (remove clothes without permission), etc
With 10 likes, AI knows you well as colleague.
Surveillance capitalism – can’t detect being manipulated
Greater social stratification – income gap
Safety issues – does self driving car protect driver or pedestrian
Autonomous weapons – currently a human is in the loop

Solutions

Educate everyone/AI literacy, Basics of ML, DL (deep learning), RL (reinforcement learning)
Job retraining
Retirement options for those too old to reskill
Mandatory higher ed – mandatory high school was controversial
Universal basic income/negative income task
Deep fake detection – arms race
Digital alibi – so can prove what doing at all times and therefore not in fake ideo
Blockchain for everything so have complete audit trail
Default mode of skepticism

Further Future – Speculative

AGI (artificial general intelligence) – at least as smart as average person
Improve health
Solve biggest problem – climate change, politics, government
Humans could become obsolete – ex: horses became obsolete to farms. “Peak horse” was in 1915
Collapse of modern institutions – could break capitialism.
Changes already faster than society can adapt. What happens when new discoveries every day?
Dystopian future – authoritarianism, communism, fascism, AI religion, AI super bureaucracy
Or a better AI based government
ASI (artificial super intelligence) – if create AGI, intelligence exposion can happen fast. AGI can rewrite its own code.
Alignment problem – how do we align human and AI values. Reward hacking – find loopholes
AI run amok – what happens if robot mine astroids. When does it stop
Conflicts – are we pets, ants, raw materials, competition, a threat?

Positives

We evolved for short bursts of stress.
Modern society is chronic stress
Be mindful with tech
Respect AI
Don’t fear/fight change
Use tech when beneficial and skip when not
Reward AI goal states
Keep ability to intervene if decision doesn’t align

Long run

Peacefully coexist with AI
AI wins
AI and humanity merge – most likely option
Humanity ends itself

Merge

No “us vs them” problem.
Phones an extension of us
Younger generation willing to merge with mind
VR/AR glasses
Gene editing
Brain/computer interfaces
Next version of people likely to be vary different

My take

The Alice and Bob stories are fun. There was a ton of information. It went very fast and definitely need time to process. I expected more discussion of ethics rather than covering “everything” but I’m happy with how it turned out.

[2023 kcdc] rescuing your git repo using amend, reset, revert, rebase, bisect and cherry picking

Posted on June 22, 2023 by Jeanne Boyarsky

Speaker: Brian Gorman

Twitter @blgorman

Repo with all commands

For more, see the table of contents.

Note: The GitHub repo is excellent and has all the instructions/commands. I did not try to recreate them in my blog. Instead I focused on the concepts

Branching strategies

Git Flow – main > dev > feature > developer. Good if just starting out. Not doing a lot of rebasing
Trunk based – no long running branches, frequent checkins. More popular due to CICD
Forking – integration repo, lieutenants and dictators. Good in super large orgs. More advanced
While branching strategy doesn’t matter, does matter if linear commit history. (Some operations are trickier if non-linear)

Rebase and Force Push

Rebase locally (based on remote or local branch)
Can have orphaned commits
Force pushing with a lease makes it safer
May have to deal with conflicts on a rebase
Use pull request; don’t create an extra merge commit
Important to delete old branches to avoid confusion

Finding lost commits

Can use GitViz (on WIndows only?) to look at graphically – https://github.com/Readify/GitViz
git reflog –all
git checkout <id> – puts in detached HEAD state to look at it. See double parens around commit id.

Clear local cache

Unlikely to need. Cleans up state
git reflog expire –expire-unreachable-now –all – expire all commits now
git gc –prune – run garbage collection

Removing feature

Not a problem if use feature flags
Create a branch to keep safe the parts not changing
Reset branch to last commit want to keep
Create new feature branch and pick commits want

Accidentally committed to main

Stop build as quickly as possible
Let team know not to change or pull from main
Create feature branch and cherry pick commits want
Reset main hard. git push –force-with-lease
Revert change to keep history
Change settings on repo so can’t commit to main again :).
(if can’t do this, can revert instead of changing history)

Someone committed a secret

If only a local commit, delete .git and start over. If already pushed…
If don’t need history, create new repo without history. If can’t….
Stop all dev as doing massive history update
Ensure all code checked in
Use git bisect to find the first commit containing the secret (start, good id, bad id, then you keep saying if a commit is good/bad). Alternatively git log -S “secret” gives you the commit
Ensure no branches are dependent on commit after the last good commit
Amend commit with one that doesn’t have the secret, Then cherry pick the rest
Everyone has to get the repo again since commits have changed

My take

I really like the mix of concepts, visualizations and videos of actually using the functionality. Great session.

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky

Java/J2EE Software Development and Technology Discussion Blog

Daily Archives: June 22, 2023

[2023 kcdc] the elephant in your data set – avoid bias in machine learning

Notes

Points of bias

Simpson’s paradox

Terms

Metrics

Demo

Book recommendations

My take

[2023 kcdc] With Great Power Comes Great Responsibility: The Ethics of AI

History

Alice and Bob

Today

AI

What can we do

AI Developers

Fight misinformation

Near Future

Solutions

Further Future – Speculative

Positives

Long run

Merge

My take

[2023 kcdc] rescuing your git repo using amend, reset, revert, rebase, bisect and cherry picking

Branching strategies

Rebase and Force Push

Finding lost commits

Clear local cache

Removing feature

Accidentally committed to main

Someone committed a secret

My take

Notes

Points of bias

Simpson’s paradox

Terms

Metrics

Demo

Book recommendations

My take

Share this:

History

Alice and Bob

Today

AI

What can we do

AI Developers

Fight misinformation

Near Future

Solutions

Further Future – Speculative

Positives

Long run

Merge

My take

Share this:

Branching strategies

Rebase and Force Push

Finding lost commits

Clear local cache

Removing feature

Accidentally committed to main

Someone committed a secret

My take

Share this: