DevNexus 2018 – Deep Dive into Dockerfiles

Title: Deep Dive into Dockerfiles
Speakers: Raju Gandhi

For more blog posts, see the DevNexus 2018 live blogging table of contents


Benefits

  • read only/immutable
  • unique identifier

What docker files do under the covers

# Start interactive linux container
docker run -it ubuntu:17.10 bash

# create new image (read only file system)
docker commit containerName demo
# list images (local only)
docker images
docker run -it demo bash
# list running images
docker ps

Docker files – like a build file for docker
# base image
FROM ubuntu:17.19
# run command at command line
RUN touch java

Layers

  • each command creates an intermediate image with a new name
  • better to merge commands so less layers ex:
    RUN command a \
    command b
  • But need the code to be readable
  • same idea as git commits – a tree where points to parent and knows the diff
  • each OS has limit of number of layers. On Mac, this is 127
  • if run apt-get install and then apt-get clean-all, you’ve increased the image size. Two layers, one to add and one to delete. Want both in same step so doesn’t increase size by whole file size twice!.

Cache

  • every Docker pull leverages the cache
  • RUN touch java and RUN touch    java have different sha hashes and are considered different commands. So cache not used
  • Do not write RUN ls -l or other commands for debugging. You can just open an interactive bash into that layer to debug
  • Put commands that move files towards the bottom. This allows for more reuse of the common parts of the image.

General notes

Don’t install sshd. Can use docker commands for that.

Dockerfile

  • FROM
    • Implies “ancestry” – what is parent image and what will you be inheriting. Ex: running whoami when inheriting jenkins image, it prints “jenkins”. This means you have to look at lineage if having permission issues. May have to keep looking at parent and grandparent and …
    • Must be first line
    • Consider starting with Ubuntu and building yourself so know what is in there. Less security implications if do it yourself.
    • Rare, but can write “FROM scratch” to start over. Usually only used for go code.
    • While multiple FROMs are allowed, it is a terrible idea. Diamond problem; they can conflict on basic things like version of Ubuntu.
    • Do not use “latest” tag. Use an exact tag. “LATEST” is a lie. It is just a tag. You can tag versions after LATEST. If you don’t switch the tag, it is still the old one.
    • Inspect ancestors for USER, PORT, ENV, VOLUME, LABEL, etc
    • docker inspect – lets you see what is in there. Recommends tracing by hand to be more thorough.
  • RUN
    • Don’t run commands that upgrade the OS. Use a later base image instead.
    • Group commands with && so not adding more layers
    • Beware of cache. If write RUN apt-get update, it will cache the result and not run again. If use && for all related commands, they are unique.
    • Want each command on separate line starting with && and ending with \ (except first and last). This makes it easy to git diff to see what changed
  • ADD/COPY
    • Combine COPY and RUN
    • Much of RUN applies
  • LABELS
    • Use lots
    • Labels can read ENV variables
    • Can use image (compile) or container (run) scope
    • Ex: build number, scm location
    • Just like RUN, can merge all LABELS in one line. Just end line with \ to continue (no && like when running)
    • To read label, do docker inspect and “grep Labels”
  • ENTRYPOINT/CMD
    • Both can run commands
    • Both can take command raw (xxx abc) or as array of command/args ([“xxx”, “abc”]). Better to use array so bash doesn’t have to fork
    • Run [“/bin/bash” “-c” “xxx” “$arg”] if need variable expansion to get bash involved
    • Better to have a shell script and call it with ENTRYPOINT. That way gets treated as a shell script without having to call bash. It is also easier to read since you aren’t writing bash as strings. The cost is that you add a layer because you have to copy the script into the image.
    • docker stop id – takes 10 seconds because need to send a signal to PID 1 and have it stop

Other practices

  • Create you own ancestry/hierarchy.
  • Containers are changing how we ship software. Don’t put Oracle’s JDK in an image and then put on dockerhub. Legal issues.
  • Consider using multi-stage builds.
    • FROM x as y …. FROM z – different scopes – binary executable vs build tools.
    • “as y” is what makes it a multi-stage build. Everything until the next “FROM” is not part of the container. The “FROM” without an “as” creates what will go in the container.
    • Only contains what need. Not the build tools. Smaller image. Less “extra” stuff.

My take

This was really good. It was clear if you didn’t know much about Docker while still having good info for those somewhat familiar. For the best practices, some people were taking notes on what to fix! I actually realized that I applied one of his anti-patterns to a Java compare program I wrote. I need to go and add some new lines in my generator for ease of diff! More gray/black background for code though. Blue (comments) on a dark background is hard to read.

Also, it is great that DevNexus has tables in the first few rows. Reward for sitting up front! (I chose to blog on my Mac as I expected to type a lot of commands).