[2023 kcdc] rescuing your git repo using amend, reset, revert, rebase, bisect and cherry picking

Posted on June 22, 2023 by Jeanne Boyarsky

Speaker: Brian Gorman

Twitter @blgorman

Note: The GitHub repo is excellent and has all the instructions/commands. I did not try to recreate them in my blog. Instead I focused on the concepts

Branching strategies

Git Flow – main > dev > feature > developer. Good if just starting out. Not doing a lot of rebasing
Trunk based – no long running branches, frequent checkins. More popular due to CICD
Forking – integration repo, lieutenants and dictators. Good in super large orgs. More advanced
While branching strategy doesn’t matter, does matter if linear commit history. (Some operations are trickier if non-linear)

Rebase and Force Push

Rebase locally (based on remote or local branch)
Can have orphaned commits
Force pushing with a lease makes it safer
May have to deal with conflicts on a rebase
Use pull request; don’t create an extra merge commit
Important to delete old branches to avoid confusion

Finding lost commits

Can use GitViz (on WIndows only?) to look at graphically – https://github.com/Readify/GitViz
git reflog –all
git checkout <id> – puts in detached HEAD state to look at it. See double parens around commit id.

Clear local cache

Unlikely to need. Cleans up state
git reflog expire –expire-unreachable-now –all – expire all commits now
git gc –prune – run garbage collection

Removing feature

Not a problem if use feature flags
Create a branch to keep safe the parts not changing
Reset branch to last commit want to keep
Create new feature branch and pick commits want

Accidentally committed to main

Stop build as quickly as possible
Let team know not to change or pull from main
Create feature branch and cherry pick commits want
Reset main hard. git push –force-with-lease
Revert change to keep history
Change settings on repo so can’t commit to main again :).
(if can’t do this, can revert instead of changing history)

Someone committed a secret

If only a local commit, delete .git and start over. If already pushed…
If don’t need history, create new repo without history. If can’t….
Stop all dev as doing massive history update
Ensure all code checked in
Use git bisect to find the first commit containing the secret (start, good id, bad id, then you keep saying if a commit is good/bad). Alternatively git log -S “secret” gives you the commit
Ensure no branches are dependent on commit after the last good commit
Amend commit with one that doesn’t have the secret, Then cherry pick the rest
Everyone has to get the repo again since commits have changed

My take

I really like the mix of concepts, visualizations and videos of actually using the functionality. Great session.

[2023 kcdc] chatgpt: don’t take my job, help me thrive in it

Posted on June 22, 2023 by Jeanne Boyarsky

Speaker: Steve Odell

For more, see the table of contents.

Timeline

1940 – enigma
1964 – first chat bot
mid 2022 – GitHub co-pilot came out
Then ChatGPT 3.5
Panic about AI taking all our jobs

Survey

Most people in room used ChatGPT
A few used Bard
A good number use GitHub Co-Pilot

1969

Had ChatGPT write a story about ATMs rendering bank tellers obsolete
It was well written
Talked about roles evolving
Also covered analogy to ChatGPT and talking about enhancing capabilities

Takeaways

Not going to take our jobs
Can let you down just as much as it impresses you
Do not take at face value
Often apologizes when wrong and wrong a lot

Examples

Lawyer used ChatGPT which made up cases. Used real case numbers but unrelated – https://simonwillison.net/2023/May/27/lawyer-chatgpt/
Asked for a C# function to calculate the points in a bridge hand. Gave it the rules in a prompt and a description about the notation. Quickly provided code that looks reasonable on first glance. When tested code, got wrong answer – 18 points, for a 20 point hand. ChaptGPT also wrote a bulleted list explaining logic and got 20 points in explanation, but not code. Realizes messed up and explains why wrong in a way that conflicts with the explanation.
Succeeded at codegolf – rewriting code in less lines.
Tried to get to write infrastructure as code. First gave approach to set up cloud formation for a high level description of what want for AWS. Did good job listing AWS services need and short description of each. Then asked to create the cloud formation templates listing services. Gave a stub of the yaml leaving out all the hard parts. Ex # VPC properties. Then tried one at a time and didn’t tie them together..
On the next example for an OAUTH workflow in Maui, ChatGPT just said can’t do it and provided a basic login page which was nothing like what asked for. Thinks not enough code as training data. New and lot of code is internal to companies.’
Repeated example in Reactive Native. Didn’t test, but looks much better; includes OAUTH workflow and expected parts.

Prompt engineering

Some companies are hiring prompt engineers
Skill set we should all learn
Tried getting SQL for a recipe app. Asked for table with create table scripts listing fields want and more about each. Did good job on keys and not null constraints. Unit of measure was vararg rather than numeric. Did right when asked for a units of measure table.
Chaining prompts in the same discussion gets to where want.

My take

Standing room only crowd. I got there very early (because I needed to leave at the 30 minute mark) and was barely able to get an aisle seat. [I misread the calendar and have a work presentation at 11am eastern].

The first half of the presentation was excellent. The examples were clear and run. Gave an excellent sense of the current state of AI. The beginnings of the prompt engineering section was great as well. I wish I could have stayed for the rest.

[2023 kcdc] data science: zero to hero

Posted on June 21, 2023 by Jeanne Boyarsky

Speaker: Gary Short

Twitter; @garyshort

Repo for presentation/samples: https://github.com/garyshort/kcdc2023

For more, see the table of contents.

Data science overview/rules

Applied data science – solving business problems
Curiosity is most important
The universe does random stuff so you haven’t discovered anything until you prove you’ve discovered something
Only qualitative and quantitative data – people lie, Can’t trust what you ask
Can only do math with numbers. Some things will pretend to be numbers when they are not. Also, can’t add different things (dollars vs killograms)
If you can’t explain it to a six year old, you don’t really understand it
Only have to be more than 51% accurate to do better than guessing
True random data has some clusters. The cluster will not last forever. Gambler’s paradox. 27 blacks doesn’t mean due a red.
If it’s not in production, it doesn’t exist. Can’t just be on your laptop. Most data scientists need to give to someone else to get it to prod. Cultural difference between data scientist and person who is building/deploying.
% chance of hypothesis being right or wrong doesn’t have to sum up to 100%. ex: grass is wet. Could be rain or a dog peeing or something else

Types of data

Structured

Relational data
Get connection, create cursor, fill cursor, close connection
Schema is important on data write.

Semi structured

ex: JSON/MongoDB.
Get connection, name collection, fill cursor, close connection
Schema important when read data

Unstructured

Blob (binary large object)
Stored in pages/blocks
Access via URL

Graph

Degrees of separation – can you deliver a message directly
People in room now more closely connected because in this session (and would stay so if shared contact info)
Wide network effect
Nodes tend to be nouns
Edges tend to be verbs. Can be unidirectional or bidirectional
Get connection, state query, fill cursor, close connection

AI/ML works on data types

Categorical – segregate data by category where category is not important (ex: blue eyes)
Ordinal – order is important but distance between is not important (ex: position in a race)
Numeric – order is important but distance is the same (ex: counting)
Ratio – numeric but with positive numbers

Can only do math with ordinal and ratio types. A survey on a scale of 1-5 (likert scale) is ordinal, not numeric/ratio. Can’t do average. This is categorical data (ex: very happy, pissed off). Can do math with counts of categorical data but not single items.

Exploratory Data analysis

Need to understand the variables. Ex: is it really a number
Handle missing values – depends on scenario. Ex: use mean or median (if not looking for that particular thing), delete row with incomplete data
Outlier detection – sometimes genuinely an outlier (ex: someone who is 8 feet tall), sometimes it is the important piece of data (ex: which exits people use in a fire; one person went the other way and want to know why). Need to determine why outlier and if care so don’t delete data need
Univariate analysis – ex: histogram for categorical data
Bivariate analysis – correlated data; could be hidden variable. Don’t need both of them since one predicts the other. Want minimal variables in model so chose the one that brings in the most info.

Feature Selection

Preprocess the data
Normalize data – units have to be the same. Using variance doesn’t help because unit is now original unit squared. Can use Z-score so everything on scale 0-1 using mean and divisor
Encode the categories – make so can do math
Booleans are numbers (0 and 1)
Word vector – can use math to represent a word. Complicated. Ok to have to look up every time.
Bi/multivariate analysis – high correlation means redundant info
Feature importance – check coefficients from regressions and scores from gradient boosting

Model Selection

People have a favorite model
Use one or more models. See which gives best result before making any changes to the model.
Good to use a linear and non linear one. Normal the linear model is enough because normally dealing with people (directly or indirectly). Linear equations work for a normal distrobution.
Make sure to find global minimum, not local/current one
Compete with yourself. Try to have your second best model beat your current best model. Once something in prod, start again

Train/test split

80/20 split
80% data for training
20% data real
Model never sees training data because can’t grade own homework

Model evaluation

Outcome – model + error
Error is difference between predicted and observed values.
Sample of population can be model. Get error because of sampling bias

Hyper Parameter Tuning

Every models have parameters to govern how works.
Hyper param tuning is fiddling with these
Will be an optional value for each of these parameters for your particular use case

Model Validation

Need to make sure model doesn’t work by chance
K-Fold Cross Validation – after do 80/20 split, can feed data back in and do again
Stratified Cross Validation – same as K-Fold but unbalanced classes

Bayesian inference in Real Life

P(h|e) = P(e|h) * P(h) / P(e)
In English: current belief = new evidence

Estimation

Important to be able to estimate values when have no data
Dumb questions like “how many piano tuners are there in Chicago” was testing this. So few people could do it that pulled question. [I suspect the ridicule and people memorizing the answer was a factor too]
Easier to estimate a range than an actual value
Pick a minimum that it couldn’t possibly be below. You’d be surprised and skeptical if less than that.
Pick a maximum that it couldn’t possible be above.
Pick value spits range in two so that the possibility of being above/below has equal probability. Call this the medium. Resist temptation to pick the mean.
Repeat finding the minimum to median. Call this Q1
Then repeat finding the median to maximum to get Q3.
This gets you a five point description of a distribution
Use sampling to get mean of distribution

Lab part

The lab was to predict something you want to predict and make a model and/or predict a probability. Can do individually or in groups. He also gave the option to leave. I chose leave because there was a little over an hour left when he finished explaining the lab. I need to go over the material for my own workshop so doing that instead of the lab.

My take

This was a good intro and Gary is a good, engaging speaker. I learned (and re-learned) a bunch of stuff. Both concepts and terms. Having a bunch of rules and getting into them made it fun. (ex: math needs numbers). I like that the concept part was longer (except for the lack of a break), but it would hav been better if it was advertised that way in the intro.

I disagree with Gary’s philosophy on not having a bathroom break. He started by saying there would be 60-90 minutes of lecture and then a lab. [wound up to being just over 2.5 hours] And that we are all adults and can go to the bathroom whenever. Someone asked at the 90 minute mark if there would be a bathroom break and he repeated the all adults thing expanding that you’ll catch up and the slides will be online later. He also said people feel compelled to hold it until break or go when told it is break. However, the tradeoff is that you don’t want to go to the bathroom lest you miss something that will wind up being important during the session. It’s super frustrating to miss stuff and then struggle to understand later. It may be that this workshop isn’t cumulative but there’s no way to know. Also, by not having a break, you aren’t giving people’s brain a break. It’s not just about the bathroom.

Gary stated he puts the materials online after so people don’t read it during the session. That I agree with!

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky

Java/J2EE Software Development and Technology Discussion Blog

Category Archives: Conferences

[2023 kcdc] rescuing your git repo using amend, reset, revert, rebase, bisect and cherry picking

Branching strategies

Rebase and Force Push

Finding lost commits

Clear local cache

Removing feature

Accidentally committed to main

Someone committed a secret

My take

[2023 kcdc] chatgpt: don’t take my job, help me thrive in it

Timeline

Survey

1969

Takeaways

Examples

Prompt engineering

My take

[2023 kcdc] data science: zero to hero

Data science overview/rules

Types of data

AI/ML works on data types

Exploratory Data analysis

Feature Selection

Model Selection

Train/test split

Model evaluation

Hyper Parameter Tuning

Model Validation

Bayesian inference in Real Life

Estimation

Lab part

My take

Branching strategies

Rebase and Force Push

Finding lost commits

Clear local cache

Removing feature

Accidentally committed to main

Someone committed a secret

My take

Share this:

Timeline

Survey

1969

Takeaways

Examples

Prompt engineering

My take

Share this:

Data science overview/rules

Types of data

AI/ML works on data types

Exploratory Data analysis

Feature Selection

Model Selection

Train/test split

Model evaluation

Hyper Parameter Tuning

Model Validation

Bayesian inference in Real Life

Estimation

Lab part

My take

Share this: