how containers have panned out – adrian trenaman – qcon

For more QCon posts, see my live blog table of contents. Adrian is from Gilt.

History

  • No off the shelf software to run a flash sale business. Therefore Gilt has to do something custom.
  • Started with Ruby on Rails in 2007. Didn’t scale well enough
  • Moved to Java in 2011
  • Moved to microservices in 2015
  • In a 30 day period, moved bulk of Gilt to Amazon

Problems

  • Isolation problem – nobody should be able to take down someone else’s work
  • A noon outage in 2013 – what happened
  • Impedance mismatch problems. “Developers often think of machines as something that’s all theirs, magically provided by the hardware fairy.”

Machines for Gilt Japan

  • Run 20-40 containers per machine.
  • Load balancer between two racks of three boxes each.
  • Separate machines for the database and email.
  • From developer’s point of view, a machine is a machine.

What did Gilt Japan learn

  • Scalable by time of day
  • Solves impedance mismatch – developers see “a machine”
  • Limits damage one person can do
  • Infra/Devops engineer embedded into engineering team
  • Outstanding potential problems
    • Static infrastructure
    • Resource hogging

Docker topology

  • Dark canary – only for internal use
  • Canary – First prod install. Let it run for a while (ex through a noon cycle for Gilt)
  • Release – Once happy with canary, roll it out to other nodes
  • Gilt has a lot of read only traffic which limits damage you can do and reduces need for staging environment.
  • Gilt has one container per host/EC2 instance
  • Want to have as few moving parts/risk points in deployment process
  • “We could solve this now, or just wait six months and Amazon wil provide a solution”

Projects

  • ION Roller
    • Immutable deployment – Destroy original cluster when done with this process for Docker upgrades.
    • Slow to setup/tear down environments.
    • Can be expensive for continuous deployment
    • Open source, but in house.
  • Nova
    • Uses yaml to deploy
    • No Docker registry. Base images are on Docker. Releases aren’t needed on there so go straight to Amazon
    • Less boilerplate
    • Immutable deployment on mutable infrastructure. Docker container is immutable.
  • Fighting bit rot, chaos-monkey style
    • Don’t want things to run forever in Prod.
    • What if there is a security vulnerability
    • Every day, kill oldest AMI randomly. This forces latest AMI with fixes and fail early.
    • Doesn’t solve vulnerability in Docker container. Would need new release with new base image for that. Hasn’t happened to Gilt yet.
  • Sundial
    • For running batch jobs
    • Automatically reschedules if fail
    • Define a process – group of tasks with dependencies between them

EC2

  • Less configuration
  • Automatic rollout
  • Integrations
  • IAM roles are at instance level, not container level

Using Docker as a local build platform

  • Different projects use different versions of build tools
  • Docker can be used as a versioned build container.
  • A year from now, will still have everything need to run code

Lessons

  • Containers let separate what deploy from how.where deploy it
  • Still the wild west on how containers are deployed
  • Seek immutability in the container, not in the stack
  • The competitive advantage for Gilt is to be able to deploy quickly/frequently/safely to production and therefore can innovate faster. Gilt lets engineers deploy whenever they want without asking permission.

Leave a Reply

Your email address will not be published. Required fields are marked *