[kcdc 2022] getting started with site reliability engineering

Speaker: Shradha Khard

For more, see the table of contents


  • Site Reliability Engineering
  • Operations is a software problem.
  • SRE is what you get when you treat ops as software and staff it with software engineers
  • Software dev: idea -> strategy -> dev (design, code, test)-> ops(build, deploy, support) -> deliver (real world)
  • Ops – maintenance, system upgrades and isntalls, security, compliance, cost, support help desk escalations, vendor contracts
  • Conflict – dev wants new features, ops want to make sure doesn’t break


  • SRE implements DevOps.
  • SRE is a substream
  • Ensures durable focus on engineering. Need to make sure product up and running. 50% time automate to make sure that happens
  • ex: augment S3 bucket
  • See how fast can make changes without violated SLO
  • Error budget – metric for how unreliable a system is allowed to be
  • Monitoring is not just logging in system. Need to alert and ticket too
  • Change management
  • Demand forecasting/capacity planning
  • Provisioning
  • Efficiency and Performance
  • SRE doesn’t replace DevOps people who deploy to cloud

Enabling SRE/How to Start

  • Centralized SFE team (core platform, networking)
  • Embedded (full team members of project team, teach devs how to manage, work with core team)
  • Need same skillset as dev to be SRE


  • MTTR – mean time to recovery – how long to get system healthy again. Emergency response helps with this
  • Lead time to release or rollback
  • Improve monitoring to catch and detect issues earlier
  • Estabilish error budget to have budget based risk management

Service levels

  • SLA (service level agreement) – legal agreement. Often involves compensation if not
  • SLO (service level objective) – number which SLI should be before needing improvement
  • SLI (service level indicator) – metric over time. Quantitive measure – ex: throughput, latency, error rate, utlization
  • 3 nines (99.9%) – 10 mnutes per week, 8.8 hours per year
  • 4 nines – 1 minute per week, 52 minutes per yeaar
  • 5 nines – 6 seconds per week, 5 minutes per year

Incident Management

  • Goals: Restore service to normal and minimize business impact
  • Be able to get the people who can help solve it
  • Log of events so can see when started
  • Blameless post mortems


  • Google book ”Seeking SRE”
  • Google book ”The Site Reliability Workbook”
  • Book: Implementing Service Level Objectives

My take

There was a lot of info, but easy to follow. It was great to see a structured intro vs that random things I’ve read online

[kcdc 2022] diving into debugging spring boot applications

Speaker Mark Heckler @mkheck

For more, see the table of contents


  • Developers don’t believe in magic
  • Most developers are bad at debugging. Or at least not as good as they could be
  • We got sloppy when we get used to thinking we know what’s happening
  • Important to isolate problem and not just symptoms

Code wakthru

  • @SpringBootApplication – meta-annotation. Enables the other scaning annotations
    • @SpringBootConfiguration
    • @EnableAutoConfiguration
    • @ComponentScan
  • Starter parent pom has dpendencies that have been tested together. Provided in dependency management so can choose what need
  • Proved @Component still creates a @Bean
  • SpringApplication.run returns a ConfigurableApplicationContext. We don’t typically use it directly, but can look into it.
  • ApplicationRunner (creates prop object from args) vs CommandLineRunner (has args as array). The later is slightly more efficient.
  • @Value lets you get a property

Overwriting name

  • application.properties with wrong key name. Typo causes code not to use the value
  • application.yaml – ignored; still uses application.properties because higher precedence


  • Can expose a lot of info
  • By default, opens two endpoints, status and one other. If want actual info, allow by privilege.
  • Can expose everything via management.endpoints.webexposure.include=* (don’t do this in prod)
  • loalhost:8080/actuator – see endpoints
  • localhost:8080/actuator/env – see java version, list of beans, etc (so can see order)

Remote debugging

  • In IDE config, set -agentib:jdwp=transport=dt_socket,server.. (missed the end)


  • Can set config in Docker fil.

Key point

You don’t know. You can suspect and hypothesize, but not assume.

My take

I like that Mark showed Spring source code to show what was happening. It took a long time to get to the first thing that went wrong (missing property).. (40 minutes in; another session was already applauding by then). Once he got to that part, I started learning stuff. Mark also seemed rushed for the end and that info went too fast for me. (Combo if it being new and I think he was going faster) Also, the using the audience members as names in the example was fun.

[kcdc 2022] 4 deadly sins of mentorship

Speaker: Christina Aldan @luckygirliegirl

For more, see the table of contents


  • “Experiences re the sum total of who we are, but not of what we can become” – Christina’s first mentor
  • Pass on experience
  • Doesn’t need to be older, just needs to have an experience to share
  • Results legaycc, relevancy, avoiding mistakes, honest feebck, long lasting friendship, employee retention, strenthen teams
  • Teach: work experience, new hires, resource suggestions, skill building, networking tips, how to give/receive feedback
  • Helps mentor – stay relevant, emotional intelligence, expand network, reinforce skills, is fulfilling
  • Idea: mentor monday’s – start on happy note
  • Helps mentee – gain practical advice, emotional intellingence, expand network, improve communication, get direction/focus
  • Mentoring is not making a copy of yourself. Learn from a variety of diverse people and pick what relevant to you.

Arranging mentoring

  • Can be any duration. ex: coffee once a quarter
  • Can reach out to someone you follow online and ask to discuss experience.
  • 7 minutes is a specific time limit. 5 minutes sounds like an owner. will probably renew another time
  • https://www.polywork.com has less noise than linked in
  • Measurable, finiite, clear on want help with, mentor knows that that mentor,
  • Ok to be ongoing if still set goals and be intentional.

Emotional Intelligence

  • Awareness of self – needed for empathy. Improve by naming emotions more specificallly
  • Management of self
  • Awareness of others
  • Management of others
  • Neuroplasticity – can make new connections. As soon as 5-6 hours after create new response, starts. More stimulate, the stronger the connection becomes. (thicker/more connections). If don’t stimulate neuropathway, it disolves


  • Using guilt or shame
  • Not dictatorship
  • Motivating with bribes
  • Courtesy bias – don’t want to hurt feelings. However, growth happens in discomfort

More notes

  • Tech, don’t preach
  • Guide through process
  • Mentee needs to internalize
  • Be a talent detectve to see what the mentee wants
  • Be honest while remaining kind
  • Host open hours, inlude in contractor fee, cross-dept, panel mentoring, join another team one day a month to see what do (job shadow)
  • Define boundaries, identify an dassign mentor, set a timeline
  • If no goal/end date, just fade away. Mentee wonders what’s up and mentor thinks all is fine. Reach out
  • Mentor fatigue – be aware of what you can offer. Needs to be sustainable
  • Mentee should be concise, engaged and proactive
  • Mentor – stay in lane (only there to share experiences, not a parent/therapist) set clear hours, comunicate clearly

ALGEE (for mental health)

  • Access
  • Listen
  • Give Reassurance
  • Encourage Self Care
  • Encourage professional help

My take

Christina began with a great story. I like that she did it before introducing ourself. It results in a stronger audience connection and is something I’m trying to get better about. The content was excellent and relatable. She made it interactive so we got to hear a bunch of experiences. My only complaint is the room was too small for the audience. The captioning was A LOT (10 minutes?) behind. It started late so don’t think anyone was relying on it though.