Architecture Agnostic Container Build Systems

In Cloud Management Tools On Arm, we discussed a few ways a container build system can be made HW architecture agnostic. In this post, we will show a few real-world examples of how to do this.

Why Make A Build System Architecture Agnostic?

Heterogeneous compute models are becoming more common. Arm based systems are getting deployed in data centers, and edge computing will continue to introduce a variety of HW platforms for computing. For SW projects to remain competitive, they will need to run in these new environments. Therefore it's important that a project's CI/CD loop is setup for heterogeneous computing environments.

Our Setup And Assumptions

The examples below were built on a Softiron Overdrive 1000. This machine has Quad-core A-57 aarch64 (Arm64) CPUs. Even though we are building on aarch64 HW, the modifications shown below will allow for building natively on other architectures as well. The code base used for the examples is Project Calico tag v3.1.1 which is available on GitHub. Project Calico currently doesn't support Arm based platforms (this is a work in progress), so it will make for good real-world examples. Last, understand that this post is not intended to provide comprehensive build instructions for Project Calico. We're just using this project to illustrate multi-architecture support concepts. Official build instructions can be found on GitHub.

What Is Project Calico?

Project Calico is a cloud networking solution. It creates a virtual network that spans a cluster to facilitate container communications. For this post, knowledge of overlay networks or Project Calico is not necessary. The focus will solely be on Project Calico's build system. If you are interested in learning more about overlays and Project Calico, these are discussed in Understanding And Deploying Overlay Networks.

Prereq: go-build

To build any of the Project Calico components, you will need Project Calico's containerized golang build environment. The go-build repo is located on the Project Calico GitHub page. This containerized golang environment will build for Arm without any modification to the existing instructions. It will produce a Docker image called calico/go-build tagged with latest-arm64. When building the Project Calico components, we tell Make to use this build environment by setting the Makefile's GO_BUILD_VER variable to latest-arm64.

Example 1: Typha

Clone the Typha repo and build it without modification using the command shown below. Notice the ARCH and GO_BUILD_VER variables are set.

ARCH=arm64 GO_BUILD_VER=latest-arm64 make calico/typha

Eventually, an error occurs that looks like this:

docker build --pull -t calico/typha docker-image -f docker-image/Dockerfile
Sending build context to Docker daemon  50.88MB
Step 1/10 : FROM alpine:3.4
3.4: Pulling from library/alpine
no matching manifest for linux/arm64 in the manifest list entries
Makefile:77: recipe for target 'calico/typha' failed
make: *** [calico/typha] Error 1

This error is occurring during a call to the docker build command on the Dockerfile at ./docker-image/Dockerfile (line 1 above). The error is happening in the first step of this Dockerfile (line 3 above). The FROM command in step 1 (line 3 above) is what selects which base image to use. Further, line 5 states that there is no matching manifest for linux/arm64. Docker manifests are how multiple architectures can be supported with a single image name and tag. Each Docker image has a manifest file which states which architectures are supported by the image. Many of the popular Docker Hub images support Arm, including alpine. The issue above is that Arm support on alpine images started in v3.6. Thus, this is an easy fix, all we need to do is change FROM alpine:3.4 to FROM alpine:3.6. Note that you can check the Docker Hub image page for information on which architectures are supported by an image. After making the version number change and trying the build again, we see another error:

Step 7/10 : ADD bin/calico-typha-amd64 /code
lstat bin/calico-typha-amd64: no such file or directory
Makefile:77: recipe for target 'calico/typha' failed
make: *** [calico/typha] Error 1

This error is happening on step 7 of the same Dockerfile. Line 2 of this error is our clue. It's saying that it cannot find a binary called calico-typha-amd64 in the /bin/ directory. This is because we're building this on an Arm machine. Instead, there is a binary called calico-typha-arm64 in the /bin/ directory. Clearly, something isn't right about step 7. Here are Steps 6 through 9 of the Dockerfile:

RUN mkdir /code
ADD bin/calico-typha-amd64 /code
WORKDIR /code
RUN ln -s /code/calico-typha-amd64 /usr/bin/calico-typha

Lines 2 (step 7) and 4 (step 9) are the problem. These lines are hard coded for amd64 binaries. We could take the easy way out and simply change the amd64 to arm64, or we can implement a more robust solution. A better solution for this would be to use a Dockerfile argument. Dockerfile arguments are variables that can be used within a Dockerfile. Default values can also be assigned to these arguments. The Dockerfile argument with its default assignment will look like this at the top of the Dockerfile:

ARG ARCHITECTURE=amd64

Next, use the argument (ARCHITECTURE) to fix the hard codes on lines 2 and 4 as shown below:

RUN mkdir /code
ADD bin/calico-typha-$ARCHITECTURE /code
WORKDIR /code
RUN ln -s /code/calico-typha-$ARCHITECTURE /usr/bin/calico-typha

Recall that to build Typha, we call Make on a Makefile. It's this Makefile that calls the docker build command. Thus, the Makefile will have to set the ARCHITECTURE argument when calling docker build. Below is the snippet of the Makefile which calls docker build. The docker build command is on line 7 of the below:

# Build the calico/typha docker image, which contains only Typha.
.PHONY: calico/typha
calico/typha: bin/calico-typha-$(ARCH)
    rm -rf docker-image/bin
    mkdir -p docker-image/bin
    cp bin/calico-typha-$(ARCH) docker-image/bin/
    docker build --pull -t calico/typha$(ARCHTAG) docker-image -f docker-image/Dockerfile$(ARCHTAG)

We will need to modify the above so that the ARCHITECTURE argument gets set when docker build is called (line 7 above). Since the Makefile already has a variable called ARCH, this variable can be used to set ARCHITECTURE. This is done with the docker build command switch --build-arg as shown in line 7 below:

# Build the calico/typha docker image, which contains only Typha.
.PHONY: calico/typha
calico/typha: bin/calico-typha-$(ARCH)
    rm -rf docker-image/bin
    mkdir -p docker-image/bin
    cp bin/calico-typha-$(ARCH) docker-image/bin/
    docker build --build-arg ARCHITECTURE=$(ARCH) --pull -t calico/typha$(ARCHTAG) docker-image -f docker-image/Dockerfile$(ARCHTAG)

With these modifications, Typha will now support arm64 and any other architectures that are supported by the alpine v3.6 image.

Example 2: Felix

Clone the Felix repo and build it without modification using the following command:

ARCH=arm64 GO_BUILD_VER=latest-arm64 make calico/felix

Eventually, an error occurs that looks like this:

Step 1/13 : FROM alpine:3.4
3.4: Pulling from library/alpine
no matching manifest for linux/arm64 in the manifest list entries
Makefile:177: recipe for target 'calico/felix' failed
make: *** [calico/felix] Error 1

This should look familiar. It's the same alpine image issue encountered when building Typha. Once the alpine version is set to v3.6, and the build is attempted again, the following error is encountered:

(2/2) Installing glibc-bin (2.23-r3)
Executing glibc-bin-2.23-r3.trigger
/usr/glibc-compat/sbin/ldconfig: line 1: syntax error: unexpected "("
ERROR: glibc-bin-2.23-r3.trigger: script exited with error 2
OK: 11 MiB in 17 packages
/usr/glibc-compat/sbin/ldconfig: line 1: syntax error: unexpected "("
The command '/bin/sh -c apk --no-cache add wget ca-certificates libgcc &&     wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://raw.githubusercontent.com/sgerrand/alpine-pkg-glibc/master/sgerrand.rsa.pub &&     wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-2.23-r3.apk &&     wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-bin-2.23-r3.apk &&     apk add glibc-2.23-r3.apk glibc-bin-2.23-r3.apk &&     /usr/glibc-compat/sbin/ldconfig /lib /usr/glibc/usr/lib &&     apk del wget &&     rm -f glibc-2.23-r3.apk glibc-bin-2.23-r3.apk' returned a non-zero code: 2
Makefile:177: recipe for target 'calico/felix' failed
make: *** [calico/felix] Error 2

Similar to Typha, the Makefile calls docker build on a file called ./docker-image/Dockerfile. Step 5 of this Dockerfile is where the above error appears. The error shows that the issue is with glibc-bin-2.23-r3 (line 5 above). Below is what step 5 looks like in the Dockerfile:

# Download and install glibc in one layer
RUN apk --no-cache add wget ca-certificates libgcc && \
    wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://raw.githubusercontent.com/sgerrand/alpine-pkg-glibc/master/sgerrand.rsa.pub && \
    wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-2.23-r3.apk && \
    wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-bin-2.23-r3.apk && \
    apk add glibc-2.23-r3.apk glibc-bin-2.23-r3.apk && \
    /usr/glibc-compat/sbin/ldconfig /lib /usr/glibc/usr/lib && \
    apk del wget && \
    rm -f glibc-2.23-r3.apk glibc-bin-2.23-r3.apk

To debug issues with Dockerfiles, manually run the steps in the Dockerfile until the issue is encountered. In the case of the Felix Dockerfile, this means we start by launching an alpine container. The following command launches the container:

docker run -ti alpine:3.6 sh

In the container shell, we run all the commands up to step 5 which is where the error occurs. As expected, the error occurs when adding the package glibc-bin-2.23-r3.apk. To stop this error from occurring, remove the installation of this package from the Dockerfile. Keep in mind, that it's important to make sure a change like this will not negatively impact the container's operation. In our testing, not installing this package had no negative effect. After removing the package installation and trying the build again, another error appears:

/bin/sh: /usr/glibc-compat/sbin/ldconfig: not found
The command '/bin/sh -c apk --no-cache add wget ca-certificates libgcc &&     wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://raw.githubusercontent.com/sgerrand/alpine-pkg-glibc/master/sgerrand.rsa.pub &&     wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-2.23-r3.apk &&     wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-bin-2.23-r3.apk &&     apk add glibc-2.23-r3.apk &&     /usr/glibc-compat/sbin/ldconfig /lib /usr/glibc/usr/lib &&     apk del wget &&     rm -f glibc-2.23-r3.apk glibc-bin-2.23-r3.apk' returned a non-zero code: 127
Makefile:177: recipe for target 'calico/felix' failed
make: *** [calico/felix] Error 127

This error is clear, /usr/glibc-compat/sbin/ldconfig is missing. Maybe this is due to the fact that glibc-bin-2.23-r3.apk wasn't installed, or maybe ldconfig is in a different location on an Arm64 alpine image. To check if this has something to do with glibc-bin-2.23-r3.apk not getting installed, find an amd64 machine, install glibc-bin-2.23-r3.apk, and see where ldconfig is located. We're not showing it here, but when we tried this, we noticed that ldconfig was also not present in /usr/glibc-compat/sbin/. Instead, it was located in /sbin/. It was also located there in the Arm64 version of the alpine image. Thus, this appears to be a Dockerfile bug and not a base image issue. In any case, change the path of ldconfig from /usr/glibc-compat/sbin/ldconfig to /sbin/ldconfig (or make a symlink). With all of the above changes, step 5 in the Dockerfile should look like this:

# Download and install glibc in one layer
RUN apk --no-cache add wget ca-certificates libgcc && \
    wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://raw.githubusercontent.com/sgerrand/alpine-pkg-glibc/master/sgerrand.rsa.pub && \
    wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-2.23-r3.apk && \
    wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-bin-2.23-r3.apk && \
    apk add glibc-2.23-r3.apk && \
    /sbin/ldconfig /lib /usr/glibc/usr/lib && \
    apk del wget && \
    rm -f glibc-2.23-r3.apk glibc-bin-2.23-r3.apk

With these modifications, Felix will now support arm64 and any other architectures that are supported by the alpine v3.6 image.

Example 3: Calico CNI Plugin

Clone the cni-plugin repo and build it without modification using the following command:

ARCH=arm64 GO_BUILD_VER=latest-arm64 make docker-image

Eventually an error occurs that looks like this:

Step 3/14 : ADD dist/amd64/calico /opt/cni/bin/calico
lstat dist/amd64/calico: no such file or directory
Makefile:135: recipe for target 'cni_deploy_container-arm64.created' failed
make: *** [cni_deploy_container-arm64.created] Error 1

The Makefile calls docker build on a file called ./Dockerfile (not shown because the logs are too long). Step 3 of this Dockerfile is where the above error appears. The error shows that a file called dist/amd64/calico doesn't exist. Just like with Typha, the Dockerfile is trying to add an amd64 binary to the container. This indicates another hard coding issue. Here are steps 3 through 8 of the Dockerfile:

ADD dist/amd64/calico /opt/cni/bin/calico
ADD dist/amd64/flannel /opt/cni/bin/flannel
ADD dist/amd64/loopback /opt/cni/bin/loopback
ADD dist/amd64/host-local /opt/cni/bin/host-local
ADD dist/amd64/portmap /opt/cni/bin/portmap
ADD dist/amd64/calico-ipam /opt/cni/bin/calico-ipam

As expected, there is a hard code of amd64 in the Dockerfile. The same solution used in the Typha example can be applied here. That is, add an ARCHITECTURE argument to the Dockerfile, and update the Makefile to pass its ARCH variable into the docker build command with the --build-arg switch. After making these changes, and running the build command again. Another error appears:

Step 3/14 : ADD dist/arm64/calico /opt/cni/bin/calico
lstat dist/arm64/calico: no such file or directory
Makefile:135: recipe for target 'cni_deploy_container-arm64.created' failed
make: *** [cni_deploy_container-arm64.created] Error 1

The error is similar to the previous error. Only now the missing file path reads as dist/arm64/calico. When we look inside the ./dist/ directory of the repo, we do in fact see ./dist/arm64/calico. This means the file is not missing, and instead is being ignored. When it looks as though a file is getting ignored by the docker build command, the first thing to look at is the .dockerignore file. This file contains the following:

*
!k8s-install/scripts/install-cni.sh
!k8s-install/scripts/calico.conf.default
!dist/amd64/calico
!dist/amd64/calico-ipam
!dist/amd64/flannel
!dist/amd64/loopback
!dist/amd64/host-local
!dist/amd64/portmap
!dist/ppc64le/calico
!dist/ppc64le/calico-ipam
!dist/ppc64le/flannel
!dist/ppc64le/loopback
!dist/ppc64le/host-local
!dist/ppc64le/portmap

The asterisk (*) at the top of the file is telling Docker to ignore every file in the repo. Under the asterisk are the exceptions which are designated with exclamation points. As can be seen, the Arm binaries are not a part of this list. Thus, the solution would be to add the Arm binaries to the .dockerignore file. After updating, it should look like this:

*
!k8s-install/scripts/install-cni.sh
!k8s-install/scripts/calico.conf.default
!dist/amd64/calico
!dist/amd64/calico-ipam
!dist/amd64/flannel
!dist/amd64/loopback
!dist/amd64/host-local
!dist/amd64/portmap
!dist/ppc64le/calico
!dist/ppc64le/calico-ipam
!dist/ppc64le/flannel
!dist/ppc64le/loopback
!dist/ppc64le/host-local
!dist/ppc64le/portmap
!dist/arm64/calico
!dist/arm64/calico-ipam
!dist/arm64/flannel
!dist/arm64/loopback
!dist/arm64/host-local
!dist/arm64/portmap

With these modifications, cni-plugin will now support arm64 and any other architectures that are supported by the alpine v3.6 image.

Closing Remarks

Using Project Calico, we showed a few examples of how to update a container build system for supporting multiple architectures. The modifications needed are typically small in number and simple to implement. We want to encourage engineers to make their CI/CD loops architecture agnostic. This will make SW projects ready to deploy in heterogeneous computing environments. As for Project Calico, there has been progress on Arm support, but more work is needed. We suggest getting involved in Project Calico. Take a look at the Cross build docker images GitHub issue and post a message expressing interest to help. Even though the work is on setting up cross compiling, since the code is in golang, solving the cross compiling problem should solve the native aarch64 compile problem.

View Cross build docker images issue on GitHub

Anonymous