In the last few years, once popular monolithic application architectures are being dropped in favor of distributed microservice architectures – for reasons of improved resiliency, portability, and scalability. At the technological heart of this transition is the container; a form of lightweight process isolation which takes advantage of many previously existing features of Linux, specifically namespaces, cgroups, and chroot jails.
In addition to the exploding popularity of microservices, we have also seen an increasing emphasis on bringing processing power down from the cloud to the “edge”, closer to where data is collected. By doing so, we can perform compute intensive tasks with improved response time and less dependence on network bandwidth to the cloud. While there is still much ambiguity of what class of devices constitute the “edge”, for the purposes of this post, we will assume these are server-class capable devices running Linux. This includes everything from Raspberry Pis to potentially more powerful machines like NVIDIA Xaviers.
The intersection of these two trends makes for some interesting challenges and opportunities for innovation, as microservice tooling and infrastructure have largely focused on the cloud, though edge computing brings new design considerations and tradeoffs. One of these challenges, and the focus of this post, is keeping the disk space utilization of your application images down, for devices which have limited storage space. To do this, we will examine some tips and tricks to employ when crafting minimal container images, and discuss why it is important to do so when working with applications running at the edge.
Without question, the most popular toolkit to build and run containers is Docker. To construct container images, users create Dockerfiles: files containing a list of commands on how to produce the environment that your application will run in. If not careful when creating these files, you will realize the images they build can rapidly consume substantial portions of the disk-space available to your device. Constructing concise, maintainable Dockerfiles that produce small images is most certainly an art form. Figure 1 below shows the logical construction of a Docker image based on Ubuntu 15.04.
Figure 1: Docker Image Layer Diagram (original source: https://docs.docker.com/v17.09/engine/userguide/storagedriver/images/container-layers.jpg)
Let’s take a look at an example Dockerfile which produces an image for PulseAudio, a popular audio management system for Linux, which can be used on a Raspberry Pi:
FROM arm32v6/alpine:3.8 RUN apk update && \ apk add --no-cache pulseaudio pulseaudio-alsa COPY ["default.pa", "daemon.conf", "/etc/pulse/"] COPY asound.conf /etc/asound.conf EXPOSE 4713 ENTRYPOINT ["pulseaudio", "-v"]
This Dockerfile can be broken down as follows:
When running a Docker build given a Dockerfile, each of the commands “RUN”, “COPY”, and “ADD” create a new layer in the resultant image. You can think of image layers like snapshots of the root file system at a given point in time. Each executed command results in a new layer which encodes the difference between itself and the previous layer. In the above example, we would start with a base of all the layers included in the alpine:3.8 image, then add on 3 more layers to create our final PulseAudio image.
Now, with a brief refresher on how images are created and stored, let’s examine some best practices when creating minimal Docker images.
Without paying caution to the structure of your Dockerfiles, there are a few ways you can add unnecessary bloat to your images:
Not only do unnecessarily large images consume excessive disk space, they also have the adverse effect of increasing the attack surface of your containerized application. If an adversary were to gain access to your running container, they would have a large set of tools already available from within the container to carry out an attack.
Further, when deploying and updating containerized applications at the edge, network bandwidth may be restricted, and you cannot afford to be pulling down new images which are gigabytes in size on a consistent basis.
Finally, if you are using a board with limited disk space (like you might find at the edge), there is even more pressure to carefully construct minimal application images. As you roll out updates with new images, old stale images will be left on disk, and it requires human intervention, or a scheduled job to periodically clean up this space. If you are leaving behind unused bulky images, you can quickly run out of disk space on your device.
In my experience, I split slimming images into two logically separate approaches, bottom-up and top-down. Each has their own pros and cons and should be selected based on your use case.
In the bottom up approach, we start from minimal base images and add only the necessities our project needs to run.
Use Alpine Linux:
One common approach in the Docker community is to build images utilizing Alpine Linux base images (~5MB). Compared to Ubuntu (~64MB) or Debian (~114MB), Alpine provides a much leaner starting point.
Let’s start by building a PulseAudio image using Ubuntu as per the Dockerfile below:
FROM ubuntu:18.04 RUN apt update && \ apt install -yqq pulseaudio COPY ["default.pa", "daemon.conf", "/etc/pulse/"] COPY asound.conf /etc/asound.conf EXPOSE 4713 ENTRYPOINT ["pulseaudio", "-v"]
This results in a 226MB image. Not so great, and we can most certainly do better. Now let’s make sure that we cleanup anything that was unnecessarily added to the image. Running apt update will automatically cache package indices, which bloats the image, and apt will also install packages which are recommended but not required unless we use the “no-install-recommends” flag. We fix these issues in the following Dockerfile:
FROM ubuntu:18.04 RUN apt update && \ apt install -yqq --no-install-recommends pulseaudio && \ apt autoremove -yqq && \ apt clean -y && \ rm -rf /var/lib/apt/lists/* COPY ["default.pa", "daemon.conf", "/etc/pulse/"] COPY asound.conf /etc/asound.conf EXPOSE 4713 ENTRYPOINT ["pulseaudio", "-v"]
Even using best practices for cleanup with apt the resultant image is still 86MB! You can only do so well when starting from an Ubuntu base image, while maintaining a sane looking Dockerfile. So finally, lets swap out Ubuntu for Alpine, as per the Dockerfile used in the introduction:
We made sure to avoid caching our package indexes when installing PulseAudio, and the result is a 22MB image, with a very clean Dockerfile. This is a 10x size reduction from the original 226MB image!
Great for:
Challenges:
Share Base Layers:
Another method categorized under the bottom-up approach is designing your images such that they share as many common base layers as possible. When machines pull Docker images, they bring down each layer of the image independently. This saves time because the machine pulls only the layers which don’t exist locally.
To expand on the earlier Python example, assume you created two simple python apps, one that prints “Hello World!” and one that prints out the current local weather. Assuming that these images both used python:3.7 as a base image, if you have already pulled your “Hello World” Python application, then when pulling down the weather app, you would only incur the cost of pulling down the layers beyond those included in the python:3.7 image. Further, only one copy of the 918MB base image is stored on disk.
If you are building a microservice-based system from scratch, see if you can use the same base image for multiple microservices. This will greatly reduce the amount of time required to pull your images down onto your machine, as well as roll out updates to your system.
Great For:
In this approach we start out with a “fat” image, which already contains all the packages we need to build our application. Then, once the application is built, we strip out everything minus the final binary and any files it utilizes. Generally, this leads to enormous reductions in image size, as all that is left in the final image is the bare minimum that the application needs. There are two methods that I personally use when taking this approach. One is using Docker multi-stage builds, and the other is using a neat open-source tool which I’ve added Arm support to: DockerSlim.
Multi-Stage Builds:
Available in Docker 17.05 and beyond, multi-stage builds allow you to break the construction of your image into multiple stages and handpick only the files you want from previous stages in subsequent ones. As an example, here is a Dockerfile used to build a Go application:
############################ # STEP 1 build application ############################ FROM golang:alpine as builder RUN apk update && \ apk add --no-cache curl git xz && \ curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh WORKDIR $GOPATH/src/accounting/app/ COPY listener.go . COPY Gopkg.toml . RUN dep ensure RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm go build -a -v -installsuffix cgo -ldflags="-s" -o /go/bin/listener ############################ # STEP 2 build a small image ############################ FROM scratch COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ COPY --from=builder /go/bin/listener /go/bin/listener EXPOSE 5000 ENTRYPOINT ["/go/bin/listener"]
In the first stage of the build, which we name “builder”, we start with a base image which contains all the tools we need to build our application. Here we don’t care so much about the image size, rather we just wish to avoid worry about installing any build tools which might not be available in the base image.
Once the first stage finishes, we move on to the second, which will create our final “slim” image. First, we specify that we want to start from a scratch file system. We then cherry pick only the files we want from the “builder” stage using a special version of the “COPY” command which copies from the image constructed in the “builder” stage rather than our host machine. In this example we fetch only the CA certificates and the compiled binary.
Finally, we specify what port to expose and what command to execute when a container is run using this image.
The output of building this image is 11MB, compared to the single-stage build shown below which produces a 486MB image, a 47x reduction in size!
FROM golang:alpine WORKDIR $GOPATH/src/accounting/app/ COPY ["listener.go", "Gopkg.toml", "./"] RUN apk update && \ apk add --no-cache curl git xz && \ curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh && \ dep ensure && \ CGO_ENABLED=0 GOOS=linux GOARCH=arm go build -a -v -installsuffix cgo -ldflags="-s" -o /go/bin/listener EXPOSE 5000 ENTRYPOINT ["/go/bin/listener"]
An extra trick for reducing the file size of your applications is to compress your final binary using a tool named UPX. Modifying the the original multi-stage example:
############################ # STEP 1 build application ############################ FROM golang:alpine as builder RUN apk update && apk add --no-cache curl git xz && \ curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh WORKDIR $GOPATH/src/accounting/app/ COPY listener.go . COPY Gopkg.toml . RUN dep ensure RUN curl -LO https://github.com/upx/upx/releases/download/v3.95/upx-3.95-amd64_linux.tar.xz RUN tar -xvf upx-3.95-amd64_linux.tar.xz RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm go build -a -v -installsuffix cgo -ldflags="-s" -o /go/bin/listener RUN ./upx-3.95-amd64_linux/upx /go/bin/listener ############################ # STEP 2 build a small image ############################ FROM scratch COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ COPY --from=builder /go/bin/listener /go/bin/listener EXPOSE 5000 ENTRYPOINT ["/go/bin/listener"]
This results in a final image size of 3.6MB, about a third of the original multi-stage build size of 11MB!
DockerSlim:
Another useful tool to minimize your image size is DockerSlim. This tool analyzes everything that your application utilizes at runtime and generates a new slimmed image, containing only the bare necessities your application needs. As an added benefit, it also generates Seccomp and AppArmour profiles. To slim your image, run a container based on your “fat” image using the docker-slim tool, interact with the running application as you normally would, then terminate the program. I’ve added support for Arm with Arm64 support in the works.
To demonstrate, let’s minimize our above example Go application starting from our “fat” 486MB image:
Fetch the docker-slim arm release:
$ wget https://github.com/dockerslim/dockerslim/releases/download/1.25.3/dist_linux_arm.tar.gz $ tar -xvzf dist_linux_arm.tar.gz $ cp dist_linux_arm/* /usr/local/bin
Now we create a json file used by the tool to specify HTTP REST requests to issue against our running application. Here is a simple example for our application:
{ "commands": [ { "protocol": "http", "method": "POST", "resource": "/poc/api/v1.0/post", "headers": ["Content-Type:application/json"], "body": "{\"key1\": \"value1\", \"key2\" : \"value2\"}" } ] }
Assume that we built our fat image with the tag fat_go_app:latest. Now we use the tool to create a slim image:
$ sudo docker-slim build --env MY_NODE_NAME=test_node --env MONGO_CONN=mongodb://mongo-db-server:80 --http-probe-cmd-file probeCmd.json --tag slim_go_app:latest fat_go_app:latest
Here we supply all the environment variables which the application looks for using the “env” flags. In this case I run a test MongoDB instance in a separate VM and pass the connection string as an environment variable, as well as a mock node name, which would normally be used when running this application using Kubernetes. We also pass our HTTP probe command file as an argument and specify what to name the resultant slimmed image. Snippets of the output of the command are shown below:
pi@raspberrypi~/simple-go-app $ sudo docker-slim build --env MY_NODE_NAME=test_node --env MONGO_CONN=mongodb://mongo-db-server:80 --http-probe-cmd-file probeCmd.json --tag fat_go_app:slim fat_go_app:latest...docker-slim[build]: state=http.probe.starting message='WAIT FOR HTTP PROBE TO FINISH'docker-slim[build]: info=prompt message='USER INPUT REQUIRED, PRESS <ENTER> WHEN YOU ARE DONE USING THE CONTAINER'docker-slim[build]: state=http.probe.runningdocker-slim[build]: info=http.probe.call status=200 method=POST target=http://127.0.0.1:32799/poc/api/v1.0/post attempt=1 time=2019-08-12T16:29:58Zdocker-slim[build]: info=http.probe.call status=404 method=GET target=http://127.0.0.1:32799/ attempt=1 time=2019-08-12T16:29:58Zdocker-slim[build]: info=http.probe.summary total=2 failures=0 successful=2docker-slim[build]: state=http.probe.done...docker-slim[build]: state=container.inspection.donedocker-slim[build]: state=building message='building minified image'docker-slim[build]: state=completeddocker-slim[build]: info=results status='MINIFIED BY 45.19X [485506346 (486 MB) => 10742985 (11 MB)]'...
pi@raspberrypi~/simple-go-app $ sudo docker-slim build --env MY_NODE_NAME=test_node --env MONGO_CONN=mongodb://mongo-db-server:80 --http-probe-cmd-file probeCmd.json --tag fat_go_app:slim fat_go_app:latest
docker-slim[build]: state=http.probe.starting message='WAIT FOR HTTP PROBE TO FINISH'
docker-slim[build]: info=prompt message='USER INPUT REQUIRED, PRESS <ENTER> WHEN YOU ARE DONE USING THE CONTAINER'
docker-slim[build]: state=http.probe.running
docker-slim[build]: info=http.probe.call status=200 method=POST target=http://127.0.0.1:32799/poc/api/v1.0/post attempt=1 time=2019-08-12T16:29:58Z
docker-slim[build]: info=http.probe.call status=404 method=GET target=http://127.0.0.1:32799/ attempt=1 time=2019-08-12T16:29:58Z
docker-slim[build]: info=http.probe.summary total=2 failures=0 successful=2
docker-slim[build]: state=http.probe.done
docker-slim[build]: state=container.inspection.done
docker-slim[build]: state=building message='building minified image'
docker-slim[build]: state=completed
docker-slim[build]: info=results status='MINIFIED BY 45.19X [485506346 (486 MB) => 10742985 (11 MB)]'
When we get the prompt that all the HTTP requests have been issued, we then kill the process by pressing enter, and wait for it to build the resultant image based on the information it gathered. In this case, it was able to slim the image down to 11MB! Notice how it performed exactly the same as the multi-stage build, keeping only the files the application needed to function.
Carefully crafting your Dockerfiles proves to be not only important for minimizing your container attack surface, but also for speeding up the deployment times of applications running both in the cloud and at the edge.
For more information on the construction and lifecycles of containers I would refer to the Docker Docs.
Depending on the application image which you are looking to minimize, the tools listed above should serve as great guidelines giving you the greatest size reductions with the least headache. Should you have any questions feel free to contact me using the link below.
Contact the Author
With the 1.26.x release docker-slim finally has a decent support for the containerized environments and it works fine with the docker runners in Gitlab CI and Circle CI (still need to test others). The README has a few pointers about configuring those environments.
Extracting base images from the minified docker-slim images will be possible in the feature. The exact timeline is TBD. It will require more than one minified image to figure out the shared components to extract.