Docker images for TensorFlow and PyTorch running on Ubuntu 18.04 for Arm are now available. This article explains the details to build and use the Docker images for TensorFlow and PyTorch on Arm.
TensorFlow and PyTorch are two of the most popular machine learning frameworks. Both are seeing increased usage on Arm, ranging from smaller systems like the Raspberry Pi to larger systems for server and high-performance computing (HPC). Even though there is some support for AArch64 in packages already, users may want to compile everything from source. Reasons include using specific tools, targeting a different runtime environment, and experimenting with performance improvements from underlying libraries. Arm continues work to make ML on Arm well supported and to contribute optimizations to achieve the highest possible performance.
We hope these docker images and the recipes to create them will be helpful for anybody looking to use TensorFlow and PyTorch on AArch64.
Scripts to build an Ubuntu 18.04 based docker image are available from the Arm Tool-Solutions repository on GitHub.
The finished TensorFlow and PyTorch images contain:
The TensorFlow image can be configured to build TensorFlow 1 or TensorFlow 2 and can optionally be built with oneDNN 0.21.3 using a mixture of C++ reference and OpenBLAS kernels.
The TensorFlow image also contains a Python3 environment built from CPython 3.7 containing:
The PyTorch image also contains a Python3 environment built from CPython 3.7 containing:
To build and run the docker images make sure the machine being used is Arm AArch64.
$ uname -m
The newly available M6g instance, powered by AWS Graviton2 is a great way to try out the steps and run the examples. For more information about the M6g, look at the slides and recording of the recent AWS webinar. I found the examples in the TensorFlow and PyTorch images were more than 2X faster on an M6g instance vs. an A1 instance with the same number of vCPUs.
The Docker Community Engine is recommended for Linux. Instructions on how to install Docker CE are available for various Linux distributions such as CentOS and Ubuntu.
The summary to install git and Docker on Ubuntu for a username (ubuntu) is the following:
$ sudo apt update
$ sudo apt upgrade -y
$ curl -fsSL get.docker.com -o get-docker.sh && sh get-docker.sh
$ sudo usermod -aG docker ubuntu ; newgrp docker
$ docker run hello-world
Similar steps can be used for other Linux distributions.
Start by cloning the repository:
$ git clone https://github.com/ARM-software/Tool-Solutions.git
To use TensorFlow change to the tensorflow-aarch64/ directory:
$ cd Tool-Solutions/docker/tensorflow-aarch64
For PyTorch change directory to the pytorch-aarch64 directory:
$ cd Tool-Solutions/docker/pytorch-aarch64
Each framework has a five stage Dockerfile so incremental progress can be saved and reused as needed.
The build.sh script builds images and has a help flag to review the options. The build-type flag is used to specify a specific set of images to build.
The images utilize Docker BuildKit to provide the best performance. This requires docker version 18.09.1 or greater. If the version of docker being used is older, then the following line must be removed from the build.sh script:
Look at the scripts and directory to see the details of the build steps. These can be modified as needed.
To build all images use:
$ ./build.sh --build-type full
The default for TensorFlow is TensorFlow 1. To build TensorFlow 2 use the command-line option --tf_version 2. The images are tagged with -v1 or -v2 depending on the selected version of TensorFlow.
Building TensorFlow is prone to running out of memory, but the bazel_memory_limit flag can be used to avoid exhausting available memory.
For example, to build TensorFlow 2 successfully on a machine with 32GB of memory put a limit such as:
$ ./build.sh --build-type full --tf_version 2 --bazel_memory_limit 30000 --jobs 16
Once the images are built use the docker tag and push commands to save them in your favorite image repository. I use Docker Hub to save the images:
$ docker tag tensorflow-v2 jasonrandrews/tensorflow-v2
$ docker login (login with username and password)
$ docker push jasonrandrews/tensorflow-v2
Next, let us see how to run the images.
For TensorFlow some benchmarks are included. On any AArch64 machine with docker installed use the commands below to run a benchmark:
$ docker pull jasonrandrews/tensorflow-v2
$ docker tag jasonrandrews/tensorflow-v2 tensorflow-v2
$ docker run -it --init tensorflow-v2
Now in the container:
$ cd benchmarks/scripts/tf_cnn_benchmarks
$ python tf_cnn_benchmarks.py --device=CPU --batch_size=64 --model=resnet50 --variable_update=parameter_server --data_format=NHWC
$ docker pull jasonrandrews/pytorch
$ docker tag jasonrandrews/pytorch pytorch
$ docker run -it --init pytorch
Now in the container, run an example using Python and an example using C++:
$ cd examples/minst
$ python main.py --save-model –epochs 5
$ cd ; cd examples/cpp/mnist
$ mkdir build
$ cd build
$ Torch_DIR=/home/ubuntu/python3-venv/lib/python3.7/site-packages/torch/share/cmake/Torch cmake -DCMAKE_PREFIX_PATH=~/python3-venv/lib/python3.7/site-packages/torch/lib/libtorch.so ..
Docker images for TensorFlow and PyTorch on AArch64 are now available to use directly or as a starting point to build custom images for these machine learning frameworks. We welcome any feedback to make them easier to use or to increase performance.