[2023 kcdc] cve 101: the unfolding of a zero day attack

Speaker: Theresa Mammarella

Twitter: @t_mammarella

For more, see the table of contents.


Notes

  • Annual cost of cyber crime predicting to top 8 trillion. Only US and China have more than that as GDP

Terminology

  • Vulnerability – weakness/flaw in system
  • Threat – attack vector, potential action
  • Risk – probably frequency of that loss.
  • Goal of cybersecurity is to minimize risk. Can’t control intent to do harm so focus on vunlerability

CVEs

  • CVE – Common Vulnerabilities and Exposures
  • Format CVE-xxxx-yyyyy. xxxx = year came out. yyyy = identifier
  • CVSS scoring – how bad is it on a scale of 0-10. Ten is worst
  • CVSS score has three parts – basic (exploitability, impact), temporal, environmental. Good description here
  • Basic is the one we see on the CVE
  • CVE can be rejected. The number is used and cannot be reused. Example. Something thought found a vulnerability. Investigation was flawed and not an actual issue. Story about it here.

How to talk about

  • Private disclosure – organization can choose when/whether to fix/share
  • Coordinated/responsible disclosure – best practice – agreed upon time frame
  • Full/public disclosure – share everything
  • Best to report via company website, security.md file, security files on server, github private vulnerability reporting

Zero day vulnerability

Examples

  • log4jshell – remote code loading. Was reported responsibility but incomplete fix so zero days on those CVEs
  • Could be as simple as a bounds check. For OpenSSL. Announced something big coming and get ready. When announced learned it only affected OpenSSL 3 (not 2) and high, not critical so boy who cried wolf situation.

Security Practices for Developers

  • Insider threat includes poor training
  • A lot more developers than info security. Increasingly harder for security teams to keep up.
  • Cost of finding and fixing bugs increases over time
  • Does this touch the internet? take untrusted input/ handle sensitive data?
  • OWASP Top 10. Updated in 2021 to add insecure design, software/data integrity failures and server side request forgery (SSRF). Some merged such as injection.
  • Starting OWASP Top 10 for Large Language Model Applications. A draft version is available
  • mitre/hipcheck – scorecard for supply chain risk. Similarly, Sonatype security rating and OpenSSF Scorecard
  • Open source dependency management. Embedded in many projects. 90% of app is open source on average. North Korea attacked many apps including Putty

Attack types

  • Typosquatting – look alike domain with one or two wrong characters
  • Open source repo attackes – attempt to get maleware/weakness added into depednecy source
  • Build tool attacks
  • Dependency confusion – different version that shows up as latest

Trust?

  • Sometimes third party projects. ex: OpenSSF Scorecard
  • NPM and PyPI often have supply chain attacks. Maven Central more so
  • Scanning tools to find issues can be helpful
  • You are responsible when things go wrong

My take

Good talk. Covered concepts and good real life examples. I learned a few things like the OWASP Top 10 for LLMs. Appreciated the shout out to “the Java people in the front row” when talking about log4j. I added a few links in my blog that weren’t in the original presentation for things I wanted to learn more about.

[2023 kcdc] data leakage – why your ML model knows too much

Speaker: Leah Berg

For more, see theĀ table of contents.


Notes

Data Leakage

  • Also known as leakage or target leakage
  • Different meaning for information security (data leaking to outside organization)
  • Can be difficult to spot
  • Training data includes info about test.
  • Model trained on info not available in production

How models learn

  • Split data into training data and test data.
  • Test data – data model has never seen before and makes sure model gets is right
  • Can also have an optional validation set
  • Randomly pick whether data points are training or test data. – Called random train/test split
  • More training data than test data

Don’t include data from the future

  • Using a random split of time series data doesn’t work because model has learned about future data.
  • Better to use a sliding window. Use first few months to predict next month. Then add that next value and predict one after. And keep going. Adding up error gives you accuracy of model.
  • This works because model only knows about data before one asked to predict.
  • Create timeline for when events happen. That way you make sure you aren’t using data from before the prediction
  • Don’t always know where/when data was created. Important to understand business process

Don’t randomly split groups

  • Have some data from the group you are then predicting
  • Problem when new student shows up so prediction will be bad
  • scikit-learn has GroupShuffleSplit() to get full group in same set – testing or training

Don’t forget your data is a snapshot

  • In school, have pristine data set.
  • In real world, data is always changing.
  • Could tell model about data that occurred after prediction. Again think about data on timeline

Don’t randomly split data when retraining

  • Want to use same training/test data on production and challenger models to see which better.
  • One has already seen data points during training that you are testing so you don’t know if it is better.
  • Challenger model can get more data that wasn’t available originally. Ok to split new data into test/train as long as original data part is split same way.

Split data immediately

  • Risky to rescale before split because data isn’t represented same way. Min/max can vary if split after
  • Run normalization on different sets of data
  • Before split, do analysis with business, exploratory data analysis. Split data before start modeling

Use Cross Validation

  • KFold Validation – split training data into K parts
  • ex: 3 fold validation – two parts stay as training and one is validation. The test data remains as test data and is kept separate for final evaluation.
  • The validation set is for an initial test.
  • Gives more options to train model

Be Skeptical of High Performance

  • If validation much higher than train/test, suspicious.
  • If train/test/validation sets are all high/the same, suspicious.

Use scikit-learn pipeline

  • Helps avoid leaking test data into training data

Check for features correlated with target

  • If another attribute has a high match with what looking for, make sure not mixing up correlation/causation.
  • Also, avoid timeline errors for reverse causation. Ex: the thing you are looking for causes, something else

My take

Great talk. Almost all of this was new to me. It was understandable and I learned a lot.

[2023 kcdc] 10 things about postman everyone should know

Speaker: Pooja Mistry

Twitter: @poojamakes

Public workspace- https://www.postman.com/devrel/workspace/2023-10-postman-features-everyone-should-know/overview

For more, see theĀ table of contents.


Notes

  • Moving towards an API first world
  • Postman started in 2012 with a Chrome extension. Evolved into full API platform
  • More than just sending requests – ex: collections, documentation, servers
  • Web and app versions
  • Newman – CLI for postman
  • Collections, env vars, queries, etc have own id
  • Different life cycle for two personnas: producer of APIs (define, design, developer, test, secure, deploy, observe, distribute) an consumer of APIs (discover, evaluate, integrate, test, deploy, observe)
  • Test tab to test the API. Example – pm.test(“assert text”, function () {}
  • Protocols – graphql, websocket, grpc, socket io, etc
  • Scripts – can run before and after graphql
  • Pre-request script – ex: debugging
  • Can pass in $randomXXX of various types in your postman call

Postman API

  • Sign in and fork workspace if want to play with the public workspace for this talk
  • Postman has own API. ex: CRUD for collections, envs etc
  • Some clients use collection as the deliverable and then get metrics on it.

Postman echo

  • Sends back whatever you send in.
  • When pass in get params sends back json with args map being your params.
  • Post sends the text back as the data key in json.
  • Always echos headers as well

Postman visualizer

  • Can build UI in postman
  • Visualize tab on result. Put pm.visualizer.set(template, response: pm.response.json() in test tab.
  • Can use to make charts, maps, csv, etc
  • The template is HTML (which can contain JavaScript)
  • Postman provides a library of templates that you an copy/paste
  • Also see https://learning.postman.com/docs/sending-requests/visualizer/ and https://www.postman.com/postman/workspace/more-visualizer-examples/overview

Built in Libraries

  • Can automatically use faker,js, lodash, moment.js, chai.js and cryto-js
  • Ex: lodash.functionName()

Workflow Control

  • Scripting allows oops and conditionals
  • postman.setNextRequest() lets you change the order of requests in a collection
  • pm.sendRequest() allows sending multiple APIs at once
  • Collection and environment variables let you communicate between APIs

Mock Servers

  • Create a mock server in UI
  • This gives you a URL
  • Can deactivate mock server
  • Set data to return

Code Generation

  • Includes Java, curl, Node.JS, etc for requests
  • For providers, less choices but still a number

Test Automation

  • Bread and butter of postman
  • Can run manually
  • Can schedule API runs
  • Can report on results of API over time – ex: monitoring
  • Can use Newman and generate how to run CLI on other CICD: ex: Jenkins, CircleCI, GitHub Actions, Gitlab, etc
  • New: June 15 – can do performance testing using desktop client. Gives response time graph

Flows

  • Visual diagram showing order/connection/variables.
  • Can include dashboards in flow

Docs

  • Markdown syntax: https://daringfireball.net/projects/markdown/syntax
  • Can embed images
  • If documented well, can share with others
  • Explore tab shows all public APIs across Postman. Best ones are well documented.
  • Can include link to show what person/company created.
  • Can have creator workspace and aggregate your collections
  • Get help at – community.postman.com

Can try most for free. CLI not free

My take

I like that she used Postman (a public collection) and demos for most of the presentation. A lot of the features described were new to me. Excellent start to the morning.