Creating Minimal Docker Images for the Edge

September 12, 2019

15 minute read time.

In the last few years, once popular monolithic application architectures are being dropped in favor of distributed microservice architectures – for reasons of improved resiliency, portability, and scalability. At the technological heart of this transition is the container; a form of lightweight process isolation which takes advantage of many previously existing features of Linux, specifically namespaces, cgroups, and chroot jails.

In addition to the exploding popularity of microservices, we have also seen an increasing emphasis on bringing processing power down from the cloud to the “edge”, closer to where data is collected. By doing so, we can perform compute intensive tasks with improved response time and less dependence on network bandwidth to the cloud. While there is still much ambiguity of what class of devices constitute the “edge”, for the purposes of this post, we will assume these are server-class capable devices running Linux. This includes everything from Raspberry Pis to potentially more powerful machines like NVIDIA Xaviers.

The intersection of these two trends makes for some interesting challenges and opportunities for innovation, as microservice tooling and infrastructure have largely focused on the cloud, though edge computing brings new design considerations and tradeoffs. One of these challenges, and the focus of this post, is keeping the disk space utilization of your application images down, for devices which have limited storage space. To do this, we will examine some tips and tricks to employ when crafting minimal container images, and discuss why it is important to do so when working with applications running at the edge.

What is a Docker Image?

Without question, the most popular toolkit to build and run containers is Docker. To construct container images, users create Dockerfiles: files containing a list of commands on how to produce the environment that your application will run in. If not careful when creating these files, you will realize the images they build can rapidly consume substantial portions of the disk-space available to your device. Constructing concise, maintainable Dockerfiles that produce small images is most certainly an art form. Figure 1 below shows the logical construction of a Docker image based on Ubuntu 15.04.

Docker image layout diagram

Figure 1: Docker Image Layer Diagram
(original source: https://docs.docker.com/v17.09/engine/userguide/storagedriver/images/container-layers.jpg)

Let’s take a look at an example Dockerfile which produces an image for PulseAudio, a popular audio management system for Linux, which can be used on a Raspberry Pi:

FROM arm32v6/alpine:3.8 

RUN apk update && \ 
    apk add --no-cache pulseaudio pulseaudio-alsa 

COPY ["default.pa", "daemon.conf", "/etc/pulse/"] 
COPY asound.conf /etc/asound.conf 

EXPOSE 4713 
ENTRYPOINT ["pulseaudio", "-v"]

This Dockerfile can be broken down as follows:

FROM: We want to start from a base alpine:3.8 root file system. Every subsequent command runs as if we just opened a shell on a fresh installation of Alpine Linux.
RUN: Install all the packages which our application needs in order to run using the package management system shipped with Alpine, “apk”.
COPY (x2): Copy files from our host build system into the specified destination directory inside the image. In this case we copy audio configuration files for PulseAudio and Alsa.
EXPOSE: We specify this application exposes port 4713 to IP traffic.
ENTRYPOINT: When a container is run using this image, we specify we want to run the PulseAudio binary.

When running a Docker build given a Dockerfile, each of the commands “RUN”, “COPY”, and “ADD” create a new layer in the resultant image. You can think of image layers like snapshots of the root file system at a given point in time. Each executed command results in a new layer which encodes the difference between itself and the previous layer. In the above example, we would start with a base of all the layers included in the alpine:3.8 image, then add on 3 more layers to create our final PulseAudio image.

Now, with a brief refresher on how images are created and stored, let’s examine some best practices when creating minimal Docker images.

Why Should We Care About 'Slim' Images?

Without paying caution to the structure of your Dockerfiles, there are a few ways you can add unnecessary bloat to your images:

Using a large base image. One of the most surprising things about Docker images are how quickly their sizes blow up. For instance, assume you want to create a “Hello World” Python app, and you use python:3.7 as the base image in your Dockerfile. Unbelievably, the size of this image is 918MB. Almost 1GB before you even add any of the source for your own application!
Leaving cached package indices in your image layer. You can use flags specific to your package manager to ensure you don’t leave unused/cached packages in your image layers.
Leaving un-merged RUN commands in your Dockerfile, this creates extra layers in your image and adds unnecessary bloat on disk. If you have a long sequence of consecutive RUN commands, merge them by running each command in sequence using the “&&” delimiter.
Leaving in packages used to build your application in the final image. You should uninstall these in the same step they were installed if possible. Uninstalling them in a later step will not reduce the total image size, as they will still exist in previous image layers.

Not only do unnecessarily large images consume excessive disk space, they also have the adverse effect of increasing the attack surface of your containerized application. If an adversary were to gain access to your running container, they would have a large set of tools already available from within the container to carry out an attack.

Further, when deploying and updating containerized applications at the edge, network bandwidth may be restricted, and you cannot afford to be pulling down new images which are gigabytes in size on a consistent basis.

Finally, if you are using a board with limited disk space (like you might find at the edge), there is even more pressure to carefully construct minimal application images. As you roll out updates with new images, old stale images will be left on disk, and it requires human intervention, or a scheduled job to periodically clean up this space. If you are leaving behind unused bulky images, you can quickly run out of disk space on your device.

Approaches to Image Size Minimization

In my experience, I split slimming images into two logically separate approaches, bottom-up and top-down. Each has their own pros and cons and should be selected based on your use case.

The Bottom-Up Approach

In the bottom up approach, we start from minimal base images and add only the necessities our project needs to run.

Use Alpine Linux:

One common approach in the Docker community is to build images utilizing Alpine Linux base images (~5MB). Compared to Ubuntu (~64MB) or Debian (~114MB), Alpine provides a much leaner starting point.

Let’s start by building a PulseAudio image using Ubuntu as per the Dockerfile below:

FROM ubuntu:18.04

RUN apt update && \
    apt install -yqq pulseaudio

COPY ["default.pa", "daemon.conf", "/etc/pulse/"]
COPY asound.conf /etc/asound.conf

EXPOSE 4713
ENTRYPOINT ["pulseaudio", "-v"]

This results in a 226MB image. Not so great, and we can most certainly do better. Now let’s make sure that we cleanup anything that was unnecessarily added to the image. Running apt update will automatically cache package indices, which bloats the image, and apt will also install packages which are recommended but not required unless we use the “no-install-recommends” flag. We fix these issues in the following Dockerfile:

FROM ubuntu:18.04

RUN apt update && \
    apt install -yqq --no-install-recommends pulseaudio && \
    apt autoremove -yqq && \
    apt clean -y && \
    rm -rf /var/lib/apt/lists/*

COPY ["default.pa", "daemon.conf", "/etc/pulse/"]
COPY asound.conf /etc/asound.conf

EXPOSE 4713
ENTRYPOINT ["pulseaudio", "-v"]

Even using best practices for cleanup with apt the resultant image is still 86MB! You can only do so well when starting from an Ubuntu base image, while maintaining a sane looking Dockerfile. So finally, lets swap out Ubuntu for Alpine, as per the Dockerfile used in the introduction:

FROM arm32v6/alpine:3.8

RUN apk update && \
    apk add --no-cache pulseaudio pulseaudio-alsa

COPY ["default.pa", "daemon.conf", "/etc/pulse/"]
COPY asound.conf /etc/asound.conf

EXPOSE 4713
ENTRYPOINT ["pulseaudio", "-v"]

We made sure to avoid caching our package indexes when installing PulseAudio, and the result is a 22MB image, with a very clean Dockerfile. This is a 10x size reduction from the original 226MB image!

Great for:

Applications written in interpreted languages
Binaries dynamically linked against musl (Alpine uses musl as a light-weight alternative to glibc)
Intimately understanding your application and all of its dependencies. This can be taken for granted when working in a full desktop Linux environment during development.

Challenges:

Using Alpine Linux, you must determine what exact packages your application requires to run and hope that all packages which you installed from the Debian package repositories for instance are available in the Alpine repositories.
The base Alpine image does not ship with many common tools like bash, curl, or git. If your application depends on it, chances are you will have to explicitly install it using apk or build it from source. It can take a lot of trial and error to get your application running using an Alpine Linux base image.
Not many alternatives to Alpine Linux for minimal base images which use glibc.

Share Base Layers:

Another method categorized under the bottom-up approach is designing your images such that they share as many common base layers as possible. When machines pull Docker images, they bring down each layer of the image independently. This saves time because the machine pulls only the layers which don’t exist locally.

To expand on the earlier Python example, assume you created two simple python apps, one that prints “Hello World!” and one that prints out the current local weather. Assuming that these images both used python:3.7 as a base image, if you have already pulled your “Hello World” Python application, then when pulling down the weather app, you would only incur the cost of pulling down the layers beyond those included in the python:3.7 image. Further, only one copy of the 918MB base image is stored on disk.

If you are building a microservice-based system from scratch, see if you can use the same base image for multiple microservices. This will greatly reduce the amount of time required to pull your images down onto your machine, as well as roll out updates to your system.

Great For:

Allowing a bulkier base image to be shared across multiple images used in your system. This saves you time on the trial and error which would be spent getting your application to run on top of stripped base image like Alpine Linux.
Applications written in interpreted languages like Python. Sharing one runtime image with all Python packages installed, all that needs to be pulled is just layers containing your python source, which is generally trivial in size compared to the runtime.

Challenges:

If one of your images is updated and no longer can share the base image, you now incur a hit on disk space in having to pull another base image to support your new image.
It can be difficult and time consuming determining the intersection of all of your application’s dependencies in my experience.

The Top-Down Approach

In this approach we start out with a “fat” image, which already contains all the packages we need to build our application. Then, once the application is built, we strip out everything minus the final binary and any files it utilizes. Generally, this leads to enormous reductions in image size, as all that is left in the final image is the bare minimum that the application needs. There are two methods that I personally use when taking this approach. One is using Docker multi-stage builds, and the other is using a neat open-source tool which I’ve added Arm support to: DockerSlim.

Multi-Stage Builds:

Available in Docker 17.05 and beyond, multi-stage builds allow you to break the construction of your image into multiple stages and handpick only the files you want from previous stages in subsequent ones. As an example, here is a Dockerfile used to build a Go application:

############################
# STEP 1 build application
############################

FROM golang:alpine as builder
RUN apk update && \
apk add --no-cache curl git xz && \
curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh
WORKDIR $GOPATH/src/accounting/app/
COPY listener.go .
COPY Gopkg.toml .

RUN dep ensure
RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm go build -a -v -installsuffix cgo -ldflags="-s" -o /go/bin/listener

############################
# STEP 2 build a small image
############################

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /go/bin/listener /go/bin/listener
EXPOSE 5000
ENTRYPOINT ["/go/bin/listener"]

In the first stage of the build, which we name “builder”, we start with a base image which contains all the tools we need to build our application. Here we don’t care so much about the image size, rather we just wish to avoid worry about installing any build tools which might not be available in the base image.

Once the first stage finishes, we move on to the second, which will create our final “slim” image. First, we specify that we want to start from a scratch file system. We then cherry pick only the files we want from the “builder” stage using a special version of the “COPY” command which copies from the image constructed in the “builder” stage rather than our host machine. In this example we fetch only the CA certificates and the compiled binary.

Finally, we specify what port to expose and what command to execute when a container is run using this image.

The output of building this image is 11MB, compared to the single-stage build shown below which produces a 486MB image, a 47x reduction in size!

FROM golang:alpine
WORKDIR $GOPATH/src/accounting/app/
COPY ["listener.go", "Gopkg.toml", "./"]
RUN apk update && \
        apk add --no-cache curl git xz && \
        curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh && \
        dep ensure && \
        CGO_ENABLED=0 GOOS=linux GOARCH=arm go build -a -v -installsuffix cgo -ldflags="-s" -o /go/bin/listener
EXPOSE 5000
ENTRYPOINT ["/go/bin/listener"]

An extra trick for reducing the file size of your applications is to compress your final binary using a tool named UPX. Modifying the the original multi-stage example:

############################
# STEP 1 build application
############################

FROM golang:alpine as builder
RUN apk update && apk add --no-cache curl git xz && \
curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh
WORKDIR $GOPATH/src/accounting/app/
COPY listener.go .
COPY Gopkg.toml .

RUN dep ensure
RUN curl -LO https://github.com/upx/upx/releases/download/v3.95/upx-3.95-amd64_linux.tar.xz
RUN tar -xvf upx-3.95-amd64_linux.tar.xz
RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm go build -a -v -installsuffix cgo -ldflags="-s" -o /go/bin/listener
RUN ./upx-3.95-amd64_linux/upx /go/bin/listener

############################
# STEP 2 build a small image
############################

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /go/bin/listener /go/bin/listener
EXPOSE 5000
ENTRYPOINT ["/go/bin/listener"]

This results in a final image size of 3.6MB, about a third of the original multi-stage build size of 11MB!

Great For:

Applications which can be statically linked
Makefile projects
Dockerfile readability and organization

Challenges:

If your binary is not statically linked, you must determine what particular libraries your application links against as well as what libraries it dynamically loads at runtime and explicitly copy them into your final image.
Requires a detailed understanding of your project and its dependencies in order to manually cherry pick only the required files from previous build stages. Can be time consuming for larger projects with many dependencies.

DockerSlim:

Another useful tool to minimize your image size is DockerSlim. This tool analyzes everything that your application utilizes at runtime and generates a new slimmed image, containing only the bare necessities your application needs. As an added benefit, it also generates Seccomp and AppArmour profiles. To slim your image, run a container based on your “fat” image using the docker-slim tool, interact with the running application as you normally would, then terminate the program. I’ve added support for Arm with Arm64 support in the works.

To demonstrate, let’s minimize our above example Go application starting from our “fat” 486MB image:

Fetch the docker-slim arm release:

$ wget https://github.com/dockerslim/dockerslim/releases/download/1.25.3/dist_linux_arm.tar.gz
$ tar -xvzf dist_linux_arm.tar.gz
$ cp dist_linux_arm/* /usr/local/bin

Now we create a json file used by the tool to specify HTTP REST requests to issue against our running application. Here is a simple example for our application:

{
    "commands":
    [
        {
            "protocol": "http",
            "method": "POST",
            "resource": "/poc/api/v1.0/post",
            "headers": ["Content-Type:application/json"],
            "body": "{\"key1\": \"value1\", \"key2\" : \"value2\"}"
        }
    ]
}

Assume that we built our fat image with the tag fat_go_app:latest. Now we use the tool to create a slim image:

$ sudo docker-slim build --env MY_NODE_NAME=test_node --env MONGO_CONN=mongodb://mongo-db-server:80 --http-probe-cmd-file probeCmd.json --tag slim_go_app:latest fat_go_app:latest

Here we supply all the environment variables which the application looks for using the “env” flags. In this case I run a test MongoDB instance in a separate VM and pass the connection string as an environment variable, as well as a mock node name, which would normally be used when running this application using Kubernetes. We also pass our HTTP probe command file as an argument and specify what to name the resultant slimmed image. Snippets of the output of the command are shown below:

pi@raspberrypi~/simple-go-app $ sudo docker-slim build --env MY_NODE_NAME=test_node --env MONGO_CONN=mongodb://mongo-db-server:80 --http-probe-cmd-file probeCmd.json --tag fat_go_app:slim fat_go_app:latest...
docker-slim[build]: state=http.probe.starting message='WAIT FOR HTTP PROBE TO FINISH'
docker-slim[build]: info=prompt message='USER INPUT REQUIRED, PRESS <ENTER> WHEN YOU ARE DONE USING THE CONTAINER'
docker-slim[build]: state=http.probe.running
docker-slim[build]: info=http.probe.call status=200 method=POST target=http://127.0.0.1:32799/poc/api/v1.0/post attempt=1 time=2019-08-12T16:29:58Z
docker-slim[build]: info=http.probe.call status=404 method=GET target=http://127.0.0.1:32799/ attempt=1 time=2019-08-12T16:29:58Z
docker-slim[build]: info=http.probe.summary total=2 failures=0 successful=2
docker-slim[build]: state=http.probe.done
...
docker-slim[build]: state=container.inspection.done
docker-slim[build]: state=building message='building minified image'
docker-slim[build]: state=completed
docker-slim[build]: info=results status='MINIFIED BY 45.19X [485506346 (486 MB) => 10742985 (11 MB)]'
...

When we get the prompt that all the HTTP requests have been issued, we then kill the process by pressing enter, and wait for it to build the resultant image based on the information it gathered. In this case, it was able to slim the image down to 11MB! Notice how it performed exactly the same as the multi-stage build, keeping only the files the application needed to function.

Great For:

Applications where you already have a suite of tests with good code coverage
- With an existing test bench for your app, you can run your applications tests while docker-slim inspects your running container, resulting in a slimmed image with everything your application needs.
- A possible CI/CD pipeline flow is build app image -> test application -> slim image using same test bench -> Run same test suite on minimized image -> Deploy slimmed image if all stages passed
“One-shot” applications which perform a simple task and exit
HTTP Rest APIs

Challenges:

You must ensure that when using the tool, you ensure that all files your application needs are opened, such that the tool picks up the information. Any files which your application may open in particular code execution paths that aren’t hit will be left out of the resultant image.
Difficult to integrate into Gitlab CI/CD pipelines which leverage docker-in-docker and services (contact me if you would like elaboration/assistance).
Output of DockerSlim builds will never share layers. All files are compressed into one single layer.

Final Thoughts

Carefully crafting your Dockerfiles proves to be not only important for minimizing your container attack surface, but also for speeding up the deployment times of applications running both in the cloud and at the edge.

For more information on the construction and lifecycles of containers I would refer to the Docker Docs.

Depending on the application image which you are looking to minimize, the tools listed above should serve as great guidelines giving you the greatest size reductions with the least headache. Should you have any questions feel free to contact me using the link below.

Contact the Author

Parents

Kyle Q over 4 years ago

With the 1.26.x release docker-slim finally has a decent support for the containerized environments and it works fine with the docker runners in Gitlab CI and Circle CI (still need to test others). The README has a few pointers about configuring those environments.

Extracting base images from the minified docker-slim images will be possible in the feature. The exact timeline is TBD. It will require more than one minified image to figure out the shared components to extract.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Comment

Kyle Q over 4 years ago

With the 1.26.x release docker-slim finally has a decent support for the containerized environments and it works fine with the docker runners in Gitlab CI and Circle CI (still need to test others). The README has a few pointers about configuring those environments.

Extracting base images from the minified docker-slim images will be possible in the feature. The exact timeline is TBD. It will require more than one minified image to figure out the shared components to extract.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Children

No Data

Research Articles

To the edge and beyond

Becky Ellis

London South Bank University’s Electrical and Electronic Engineering department have been using Arm IP and teaching resources as core elements in their courses and student projects.
- November 5, 2024
Intermittently-powered systems for battery-less energy

Becky Ellis

Maintaining batteries can be costly. The solution, say researchers at Newcastle University, are systems that are resilient to power intermittency.
- October 8, 2024
Power up: The road to better battery management

Becky Ellis

Accurate battery management systems are essential in the era of EVs. IIT Hyderabad’s team has turned its research into a prize-winning start-up.
- September 25, 2024

Research Articles

Creating Minimal Docker Images for the Edge

What is a Docker Image?

Why Should We Care About 'Slim' Images?

Approaches to Image Size Minimization

The Bottom-Up Approach

The Top-Down Approach

Final Thoughts

To the edge and beyond

Intermittently-powered systems for battery-less energy

Power up: The road to better battery management