In Cloud Management Tools On Arm, we discussed a few ways a container build system can be made HW architecture agnostic. In this post, we will show a few real-world examples of how to do this.
Heterogeneous compute models are becoming more common. Arm based systems are getting deployed in data centers, and edge computing will continue to introduce a variety of HW platforms for computing. For SW projects to remain competitive, they will need to run in these new environments. Therefore it's important that a project's CI/CD loop is setup for heterogeneous computing environments.
The examples below were built on a Softiron Overdrive 1000. This machine has Quad-core A-57 aarch64 (Arm64) CPUs. Even though we are building on aarch64 HW, the modifications shown below will allow for building natively on other architectures as well. The code base used for the examples is Project Calico tag v3.1.1 which is available on GitHub. Project Calico currently doesn't support Arm based platforms (this is a work in progress), so it will make for good real-world examples. Last, understand that this post is not intended to provide comprehensive build instructions for Project Calico. We're just using this project to illustrate multi-architecture support concepts. Official build instructions can be found on GitHub.
Project Calico is a cloud networking solution. It creates a virtual network that spans a cluster to facilitate container communications. For this post, knowledge of overlay networks or Project Calico is not necessary. The focus will solely be on Project Calico's build system. If you are interested in learning more about overlays and Project Calico, these are discussed in Understanding And Deploying Overlay Networks.
To build any of the Project Calico components, you will need Project Calico's containerized golang build environment. The go-build repo is located on the Project Calico GitHub page. This containerized golang environment will build for Arm without any modification to the existing instructions. It will produce a Docker image called calico/go-build tagged with latest-arm64. When building the Project Calico components, we tell Make to use this build environment by setting the Makefile's GO_BUILD_VER variable to latest-arm64.
calico/go-build
latest-arm64
GO_BUILD_VER
Clone the Typha repo and build it without modification using the command shown below. Notice the ARCH and GO_BUILD_VER variables are set.
ARCH
ARCH=arm64 GO_BUILD_VER=latest-arm64 make calico/typha
Eventually, an error occurs that looks like this:
docker build --pull -t calico/typha docker-image -f docker-image/Dockerfile Sending build context to Docker daemon 50.88MB Step 1/10 : FROM alpine:3.4 3.4: Pulling from library/alpine no matching manifest for linux/arm64 in the manifest list entries Makefile:77: recipe for target 'calico/typha' failed make: *** [calico/typha] Error 1
This error is occurring during a call to the docker build command on the Dockerfile at ./docker-image/Dockerfile (line 1 above). The error is happening in the first step of this Dockerfile (line 3 above). The FROM command in step 1 (line 3 above) is what selects which base image to use. Further, line 5 states that there is no matching manifest for linux/arm64. Docker manifests are how multiple architectures can be supported with a single image name and tag. Each Docker image has a manifest file which states which architectures are supported by the image. Many of the popular Docker Hub images support Arm, including alpine. The issue above is that Arm support on alpine images started in v3.6. Thus, this is an easy fix, all we need to do is change FROM alpine:3.4 to FROM alpine:3.6. Note that you can check the Docker Hub image page for information on which architectures are supported by an image. After making the version number change and trying the build again, we see another error:
docker build
./docker-image/Dockerfile
FROM
linux/arm64
FROM alpine:3.4
FROM alpine:3.6
Step 7/10 : ADD bin/calico-typha-amd64 /code lstat bin/calico-typha-amd64: no such file or directory Makefile:77: recipe for target 'calico/typha' failed make: *** [calico/typha] Error 1
This error is happening on step 7 of the same Dockerfile. Line 2 of this error is our clue. It's saying that it cannot find a binary called calico-typha-amd64 in the /bin/ directory. This is because we're building this on an Arm machine. Instead, there is a binary called calico-typha-arm64 in the /bin/ directory. Clearly, something isn't right about step 7. Here are Steps 6 through 9 of the Dockerfile:
calico-typha-amd64
/bin/
calico-typha-arm64
/bin
RUN mkdir /code ADD bin/calico-typha-amd64 /code WORKDIR /code RUN ln -s /code/calico-typha-amd64 /usr/bin/calico-typha
Lines 2 (step 7) and 4 (step 9) are the problem. These lines are hard coded for amd64 binaries. We could take the easy way out and simply change the amd64 to arm64, or we can implement a more robust solution. A better solution for this would be to use a Dockerfile argument. Dockerfile arguments are variables that can be used within a Dockerfile. Default values can also be assigned to these arguments. The Dockerfile argument with its default assignment will look like this at the top of the Dockerfile:
amd64
arm64
ARG ARCHITECTURE=amd64
Next, use the argument (ARCHITECTURE) to fix the hard codes on lines 2 and 4 as shown below:
ARCHITECTURE
RUN mkdir /code ADD bin/calico-typha-$ARCHITECTURE /code WORKDIR /code RUN ln -s /code/calico-typha-$ARCHITECTURE /usr/bin/calico-typha
Recall that to build Typha, we call Make on a Makefile. It's this Makefile that calls the docker build command. Thus, the Makefile will have to set the ARCHITECTURE argument when calling docker build. Below is the snippet of the Makefile which calls docker build. The docker build command is on line 7 of the below:
# Build the calico/typha docker image, which contains only Typha. .PHONY: calico/typha calico/typha: bin/calico-typha-$(ARCH) rm -rf docker-image/bin mkdir -p docker-image/bin cp bin/calico-typha-$(ARCH) docker-image/bin/ docker build --pull -t calico/typha$(ARCHTAG) docker-image -f docker-image/Dockerfile$(ARCHTAG)
We will need to modify the above so that the ARCHITECTURE argument gets set when docker build is called (line 7 above). Since the Makefile already has a variable called ARCH, this variable can be used to set ARCHITECTURE. This is done with the docker build command switch --build-arg as shown in line 7 below:
--build-arg
# Build the calico/typha docker image, which contains only Typha. .PHONY: calico/typha calico/typha: bin/calico-typha-$(ARCH) rm -rf docker-image/bin mkdir -p docker-image/bin cp bin/calico-typha-$(ARCH) docker-image/bin/ docker build --build-arg ARCHITECTURE=$(ARCH) --pull -t calico/typha$(ARCHTAG) docker-image -f docker-image/Dockerfile$(ARCHTAG)
With these modifications, Typha will now support arm64 and any other architectures that are supported by the alpine v3.6 image.
Clone the Felix repo and build it without modification using the following command:
ARCH=arm64 GO_BUILD_VER=latest-arm64 make calico/felix
Step 1/13 : FROM alpine:3.4 3.4: Pulling from library/alpine no matching manifest for linux/arm64 in the manifest list entries Makefile:177: recipe for target 'calico/felix' failed make: *** [calico/felix] Error 1
This should look familiar. It's the same alpine image issue encountered when building Typha. Once the alpine version is set to v3.6, and the build is attempted again, the following error is encountered:
(2/2) Installing glibc-bin (2.23-r3) Executing glibc-bin-2.23-r3.trigger /usr/glibc-compat/sbin/ldconfig: line 1: syntax error: unexpected "(" ERROR: glibc-bin-2.23-r3.trigger: script exited with error 2 OK: 11 MiB in 17 packages /usr/glibc-compat/sbin/ldconfig: line 1: syntax error: unexpected "(" The command '/bin/sh -c apk --no-cache add wget ca-certificates libgcc && wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://raw.githubusercontent.com/sgerrand/alpine-pkg-glibc/master/sgerrand.rsa.pub && wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-2.23-r3.apk && wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-bin-2.23-r3.apk && apk add glibc-2.23-r3.apk glibc-bin-2.23-r3.apk && /usr/glibc-compat/sbin/ldconfig /lib /usr/glibc/usr/lib && apk del wget && rm -f glibc-2.23-r3.apk glibc-bin-2.23-r3.apk' returned a non-zero code: 2 Makefile:177: recipe for target 'calico/felix' failed make: *** [calico/felix] Error 2
Similar to Typha, the Makefile calls docker build on a file called ./docker-image/Dockerfile. Step 5 of this Dockerfile is where the above error appears. The error shows that the issue is with glibc-bin-2.23-r3 (line 5 above). Below is what step 5 looks like in the Dockerfile:
glibc-bin-2.23-r3
# Download and install glibc in one layer RUN apk --no-cache add wget ca-certificates libgcc && \ wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://raw.githubusercontent.com/sgerrand/alpine-pkg-glibc/master/sgerrand.rsa.pub && \ wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-2.23-r3.apk && \ wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-bin-2.23-r3.apk && \ apk add glibc-2.23-r3.apk glibc-bin-2.23-r3.apk && \ /usr/glibc-compat/sbin/ldconfig /lib /usr/glibc/usr/lib && \ apk del wget && \ rm -f glibc-2.23-r3.apk glibc-bin-2.23-r3.apk
To debug issues with Dockerfiles, manually run the steps in the Dockerfile until the issue is encountered. In the case of the Felix Dockerfile, this means we start by launching an alpine container. The following command launches the container:
docker run -ti alpine:3.6 sh
In the container shell, we run all the commands up to step 5 which is where the error occurs. As expected, the error occurs when adding the package glibc-bin-2.23-r3.apk. To stop this error from occurring, remove the installation of this package from the Dockerfile. Keep in mind, that it's important to make sure a change like this will not negatively impact the container's operation. In our testing, not installing this package had no negative effect. After removing the package installation and trying the build again, another error appears:
glibc-bin-2.23-r3.apk
/bin/sh: /usr/glibc-compat/sbin/ldconfig: not found The command '/bin/sh -c apk --no-cache add wget ca-certificates libgcc && wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://raw.githubusercontent.com/sgerrand/alpine-pkg-glibc/master/sgerrand.rsa.pub && wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-2.23-r3.apk && wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-bin-2.23-r3.apk && apk add glibc-2.23-r3.apk && /usr/glibc-compat/sbin/ldconfig /lib /usr/glibc/usr/lib && apk del wget && rm -f glibc-2.23-r3.apk glibc-bin-2.23-r3.apk' returned a non-zero code: 127 Makefile:177: recipe for target 'calico/felix' failed make: *** [calico/felix] Error 127
This error is clear, /usr/glibc-compat/sbin/ldconfig is missing. Maybe this is due to the fact that glibc-bin-2.23-r3.apk wasn't installed, or maybe ldconfig is in a different location on an Arm64 alpine image. To check if this has something to do with glibc-bin-2.23-r3.apk not getting installed, find an amd64 machine, install glibc-bin-2.23-r3.apk, and see where ldconfig is located. We're not showing it here, but when we tried this, we noticed that ldconfig was also not present in /usr/glibc-compat/sbin/. Instead, it was located in /sbin/. It was also located there in the Arm64 version of the alpine image. Thus, this appears to be a Dockerfile bug and not a base image issue. In any case, change the path of ldconfig from /usr/glibc-compat/sbin/ldconfig to /sbin/ldconfig (or make a symlink). With all of the above changes, step 5 in the Dockerfile should look like this:
/usr/glibc-compat/sbin/ldconfig
ldconfig
/usr/glibc-compat/sbin/
/sbin/ldconfig
# Download and install glibc in one layer RUN apk --no-cache add wget ca-certificates libgcc && \ wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://raw.githubusercontent.com/sgerrand/alpine-pkg-glibc/master/sgerrand.rsa.pub && \ wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-2.23-r3.apk && \ wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-bin-2.23-r3.apk && \ apk add glibc-2.23-r3.apk && \ /sbin/ldconfig /lib /usr/glibc/usr/lib && \ apk del wget && \ rm -f glibc-2.23-r3.apk glibc-bin-2.23-r3.apk
With these modifications, Felix will now support arm64 and any other architectures that are supported by the alpine v3.6 image.
Clone the cni-plugin repo and build it without modification using the following command:
ARCH=arm64 GO_BUILD_VER=latest-arm64 make docker-image
Eventually an error occurs that looks like this:
Step 3/14 : ADD dist/amd64/calico /opt/cni/bin/calico lstat dist/amd64/calico: no such file or directory Makefile:135: recipe for target 'cni_deploy_container-arm64.created' failed make: *** [cni_deploy_container-arm64.created] Error 1
The Makefile calls docker build on a file called ./Dockerfile (not shown because the logs are too long). Step 3 of this Dockerfile is where the above error appears. The error shows that a file called dist/amd64/calico doesn't exist. Just like with Typha, the Dockerfile is trying to add an amd64 binary to the container. This indicates another hard coding issue. Here are steps 3 through 8 of the Dockerfile:
./Dockerfile
dist/amd64/calico
ADD dist/amd64/calico /opt/cni/bin/calico ADD dist/amd64/flannel /opt/cni/bin/flannel ADD dist/amd64/loopback /opt/cni/bin/loopback ADD dist/amd64/host-local /opt/cni/bin/host-local ADD dist/amd64/portmap /opt/cni/bin/portmap ADD dist/amd64/calico-ipam /opt/cni/bin/calico-ipam
As expected, there is a hard code of amd64 in the Dockerfile. The same solution used in the Typha example can be applied here. That is, add an ARCHITECTURE argument to the Dockerfile, and update the Makefile to pass its ARCH variable into the docker build command with the --build-arg switch. After making these changes, and running the build command again. Another error appears:
Step 3/14 : ADD dist/arm64/calico /opt/cni/bin/calico lstat dist/arm64/calico: no such file or directory Makefile:135: recipe for target 'cni_deploy_container-arm64.created' failed make: *** [cni_deploy_container-arm64.created] Error 1
The error is similar to the previous error. Only now the missing file path reads as dist/arm64/calico. When we look inside the ./dist/ directory of the repo, we do in fact see ./dist/arm64/calico. This means the file is not missing, and instead is being ignored. When it looks as though a file is getting ignored by the docker build command, the first thing to look at is the .dockerignore file. This file contains the following:
dist/arm64/calico
./dist/
./dist/arm64/calico
* !k8s-install/scripts/install-cni.sh !k8s-install/scripts/calico.conf.default !dist/amd64/calico !dist/amd64/calico-ipam !dist/amd64/flannel !dist/amd64/loopback !dist/amd64/host-local !dist/amd64/portmap !dist/ppc64le/calico !dist/ppc64le/calico-ipam !dist/ppc64le/flannel !dist/ppc64le/loopback !dist/ppc64le/host-local !dist/ppc64le/portmap
The asterisk (*) at the top of the file is telling Docker to ignore every file in the repo. Under the asterisk are the exceptions which are designated with exclamation points. As can be seen, the Arm binaries are not a part of this list. Thus, the solution would be to add the Arm binaries to the .dockerignore file. After updating, it should look like this:
* !k8s-install/scripts/install-cni.sh !k8s-install/scripts/calico.conf.default !dist/amd64/calico !dist/amd64/calico-ipam !dist/amd64/flannel !dist/amd64/loopback !dist/amd64/host-local !dist/amd64/portmap !dist/ppc64le/calico !dist/ppc64le/calico-ipam !dist/ppc64le/flannel !dist/ppc64le/loopback !dist/ppc64le/host-local !dist/ppc64le/portmap !dist/arm64/calico !dist/arm64/calico-ipam !dist/arm64/flannel !dist/arm64/loopback !dist/arm64/host-local !dist/arm64/portmap
With these modifications, cni-plugin will now support arm64 and any other architectures that are supported by the alpine v3.6 image.
Using Project Calico, we showed a few examples of how to update a container build system for supporting multiple architectures. The modifications needed are typically small in number and simple to implement. We want to encourage engineers to make their CI/CD loops architecture agnostic. This will make SW projects ready to deploy in heterogeneous computing environments. As for Project Calico, there has been progress on Arm support, but more work is needed. We suggest getting involved in Project Calico. Take a look at the Cross build docker images GitHub issue and post a message expressing interest to help. Even though the work is on setting up cross compiling, since the code is in golang, solving the cross compiling problem should solve the native aarch64 compile problem.
[CTAToken URL = "https://github.com/projectcalico/calico/issues/1865" target="_blank" text="View Cross build docker images issue on GitHub" class ="green"]