If you’d like to develop your Convolutional Neural Networks using just the Compute Library and a Raspberry Pi, this step-by-step guide will show you how… and it comes complete with all the tools you’ll need to get up and running.
If you follow all the steps outlined here indeed, by the end of the post you’ll be up and running with one of the first Deep Convolutional Neural Networks (CNN) designed to recognize 1000 different objects: AlexNet!
If you haven’t read my previous blog on how to apply a cartoon effect with the Compute Library, I’d suggest starting with that. It’s a simple example, but it will give you all the information you need to compile or cross-compile the library for Raspberry Pi.
In addition to some basic knowledge of the Compute Library, this tutorial assumes some knowledge of a CNN; you don’t need to be an expert, just have an idea of the main functions.
Everything else can be found in the following .7z file, which contains:
Please download the required files to your host machine (Debian based) or to your Raspberry Pi:
[CTAToken URL = "https://developer.arm.com/-/media/43359E999DEF433BAF63523C529D21AD.ashx?revision=c1a232fa-f328-451f-9bd6-250b83511e01" target="_blank" text="compute_library_alexnet files" class ="green"]
Within the folder "alexnet_tutorial" you should have everything for this tutorial.
The requirements for your Raspberry Pi and host machine are:
In release 17.09 of the Compute Library, we introduced an important feature to make life easier for developers, and anyone else benchmarking the library: the graph API.
The graph API’s primary function is to reduce the boilerplate code, but it can also reduce errors in your code and improve its readability. It’s simple and easy-to-use, with a stream interface that’s designed to be similar to other C++ objects.
At the current stage, the graph API only supports the ML functions (i.e. convolution, fully connected, activation, pooling...) and can only be used if the library has been compiled with both NEON and OpenCL enabled (neon=1 and opencl=1).
Note: if your platform doesn't have OpenCL don't worry (i.e. Raspberry Pi), the Graph API will automatically fall back onto using NEON, however you do need to compile the Compute Library with both NEON and OpenCL enabled.
In terms of building blocks, the graph API represents the third computation block, together with core and runtime. In terms of hierarchy, the graph API lies just above the runtime, which in turn lies above the core block.
In 2012, AlexNet shot to fame when it won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), an annual challenge that aims to evaluate algorithms for object detection and image classification.
The ILSVRC evaluates the success of image classification solutions is using two important metrics: the top-5 and top-1 errors. Given a set of N images (usually called “test images”) and mapped a target class for each one:
For both, the top error is calculated as, "the number of times the predicted class does not match the target class, divided by the total number of test images". In other words, a lower score is better.
The authors of AlexNet – Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever of the SuperVision group – achieved a top-5 error around 16%, which was a staggeringly good result back in 2012. To put it into context, until that year no one had been able to go under 20%. AlexNet was also more than 10% better than the runner up.
After 2012, more accurate and deeper CNNs began to proliferate, as the graph below shows
AlexNet is made up of eight trainable layers: five convolution layers and three fully connected layers. All the trainable layers are followed by a ReLu activation function, except for the last fully connected layer, where the Softmax function is used.
Besides the trainable layers, the network also has:
If you look at the table above, you’ll notice that some convolution layers are actually ‘grouping convolutions’ – an efficient engineering trick that allows the acceleration of the network over two GPUs, without sacrificing accuracy.
If the group size is set to two, the first half of the filters will be connected just to the first half of the input feature maps; the second half will connect to the second half.
The grouping convolution not only allows you to spread the workload over multiple GPUs, it also reduces the number of MACs needed for the layer by half
A C++ implementation of AlexNet using the graph API is proposed in examples/graph_alexnet.cpp.
To run the AlexNet example we need four command line arguments:
./graph_alexnet <target> <cnn_data> <input_image> <labels>
Where:
With the following sections I am going to describe the key aspects of this example.
In order to use the graph API we need to include three header files:
// Contains the definitions for the graph #include "arm_compute/graph/Graph.h" // Contains the definitions for the nodes (convolution, pooling, fully connected) #include "arm_compute/graph/Nodes.h" // Contains the utility functions for the graph such as the accessors for the input, trainable and output nodes. The accessors will be presented when we are going to talk about the graph. #include "utils/GraphUtils.h"
A pre-processing stage is needed for preparing the input RGB image before feeding the network, so we’re going to subtract the channel means from each individual colour channel. This operation will centre the red, green and blue channels around the origin.
For simplicity, we’ve already hard-coded the mean values to use in the example:
constexpr float mean_r = 122.68f; /* Mean value to subtract from red channel */ constexpr float mean_g = 116.67f; /* Mean value to subtract from green channel */ constexpr float mean_b = 104.01f; /* Mean value to subtract from blue channel */
If you’ve not heard of mean subtraction pre-processing before, have a look at the Compute Image Mean section on the Caffe website
The body of the network is described through the graph API.
The graph consists of three main parts:
As you will notice from the example, the Tensor objects (input and output) and all the trainable layers accept an input function called "accessor".
If you are curious to know how the accessor works, take a look at utils GraphUtils.h where you can find a few ready-to-use accessors for your Tensor objects and trainable layers.
Now it is time to turn on your Raspberry Pi and test AlexNet with the same images.
Note: the following steps assume you are in the home directory of your Raspberry Pi or host machine.
On your Raspberry Pi enter the following commands
# Install unzip sudo apt-get install unzip # Download the zip file with the AlexNet model, input images and labels wget <url to archive> # Create a new folder mkdir assets_alexnet # Unzip unzip compute_library_alexnet.zip -d assets_alexnet
If you are compiling natively on your Raspberry Pi, use the following instructions. If you’re cross-compiling, see the appropriate section below.
On your Raspberry Pi:
# Clone Compute Library git clone https://github.com/Arm-software/ComputeLibrary.git # Enter ComputeLibrary folder cd ComputeLibrary # Native build the library and the examples scons Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 build=native –j2
Once the library has been compiled where are ready to classify our go-kart!
export LD_LIBRARY_PATH=build/ PATH_ASSETS=../assets_alexnet ./build/examples/graph_alexnet 0 $PATH_ASSETS $PATH_ASSETS/go_kart.ppm $PATH_ASSETS/labels.txt
If you’re cross-compiling, on your host machine:
# Clone Compute Library git clone https://github.com/Arm-software/ComputeLibrary.git # Enter ComputeLibrary folder cd ComputeLibrary # Build the library and the examples scons Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 os=linux arch=armv7a -j4 # Copy the example and dynamic libraries on the Raspberry Pi scp build/example/graph_alexnet build/libarm_compute.so build/libarm_compute_core.so build/libarm_compute_graph.so <username_raspberrypi>@<ip_addr_raspberrypi>:Desktop
where:
Open the SSH session from your host machine:
ssh <username_raspberrypi>@<ip_addr_raspberrypi>
Within the SSH session:
cd Desktop export LD_LIBRARY_PATH=build/ PATH_ASSETS=../assets_alexnet ./build/examples/graph_alexnet 0 $PATH_ASSETS $PATH_ASSETS/go_kart.ppm $PATH_ASSETS/labels.txt
Whether or not you’re building the library natively, the output should look like this:
Congratulations – you got there! I hope you had fun and, more importantly, I hope this will help you to develop even more exciting and performant intelligent vision solutions on Arm.
Ciao for now!
Gian Marco
To find this tutorial, and many other resources, visit the Machine Learning Developer Community.
Hi xlla, thanks for trying the example. The inference time looks too high for alexnet on Raspberry Pi. Could you check if you are compiling the library with debug=1? Could you tell me what Arm Compute Library version are you using and the build command? Also, I recommend to open this thread on GitHub in the ACL repo so that the team can help you promptly with this issue. Hope this can help. Gian Marco
It works but slowly.
I am run it on raspberry pi 4B with 4gb ram, it took about 29s to recognize a gold fish.
<pre>
---------- Top 5 predictions ----------
1.0000 - [id = 1], n01443537 goldfish, Carassius auratus
0.0000 - [id = 27], n01631663 eft
0.0000 - [id = 29], n01632777 axolotl, mud puppy, Ambystoma mexicanum
0.0000 - [id = 124], n01985128 crayfish, crawfish, crawdad, crawdaddy
0.0000 - [id = 310], n02219486 ant, emmet, pismire
Test passed
Can't load libOpenCL.so: libOpenCL.so: cannot open shared object file: No such file or directory
Can't load libGLES_mali.so: libGLES_mali.so: cannot open shared object file: No such file or directory
Can't load libmali.so: libmali.so: cannot open shared object file: No such file or directory
Couldn't find any OpenCL library.
</pre>
Hi Gian,
I wish to learn about the article mentioned at the beginning, but the link was broken.
how to apply a cartoon effect with the Compute Library
Could you please update it?
Thanks!
I got this output: ./build/examples/graph_alexnet
Threads : 2 Target : NEON Data type : F32 Data layout : NHWC Tuner enabled? : false Cache enabled? : false Tuner mode : Normal Tuner file : Fast math enabled? : false Data path : /home/pi/Documents/assets_alexnet Image file : /home/pi/Documents/assets_alexnet/go_kart.ppm Labels file : /home/pi/Documents/assets_alexnet/labels.txt
1.0000 - [id = 672], n03792972 mountain tent 1.0000 - [id = 657], n03773504 missile 1.0000 - [id = 658], n03775071 mitten 1.0000 - [id = 659], n03775546 mixing bowl 1.0000 - [id = 660], n03776460 mobile home, manufactured home
any idea? and i tested this example on two device and the output is the same.
Has anyone tried this recently? I am getting wrong results and the program is not working for GC