Running AlexNet on Raspberry Pi with Compute Library

February 5, 2018

8 minute read time.

If you’d like to develop your Convolutional Neural Networks using just the Compute Library and a Raspberry Pi, this step-by-step guide will show you how… and it comes complete with all the tools you’ll need to get up and running.

If you follow all the steps outlined here indeed, by the end of the post you’ll be up and running with one of the first Deep Convolutional Neural Networks (CNN) designed to recognize 1000 different objects: AlexNet!

AlexNet on Raspberry Pi

Getting started

If you haven’t read my previous blog on how to apply a cartoon effect with the Compute Library, I’d suggest starting with that. It’s a simple example, but it will give you all the information you need to compile or cross-compile the library for Raspberry Pi.

In addition to some basic knowledge of the Compute Library, this tutorial assumes some knowledge of a CNN; you don’t need to be an expert, just have an idea of the main functions.

Everything else can be found in the following .7z file, which contains:

The AlexNet model (the same one as in the Caffe Model Zoo)
A text file, containing the ImageNet labels required to map the predicted objects to the name of the classes
Several images in ppm file format ready to be used with the network

Please download the required files to your host machine (Debian based) or to your Raspberry Pi:

[CTAToken URL = "https://developer.arm.com/-/media/43359E999DEF433BAF63523C529D21AD.ashx?revision=c1a232fa-f328-451f-9bd6-250b83511e01" target="_blank" text="compute_library_alexnet files" class ="green"]

Within the folder "alexnet_tutorial" you should have everything for this tutorial.

The requirements for your Raspberry Pi and host machine are:

Raspberry Pi 2 or 3 with Ubuntu Mate 16.04.02
A blank Micro SD card: we highly recommend a Class 6 or Class 10 microSDHC card with 8 GB (minimum 6GB)
Router + Ethernet cable

A word about the Graph API

In release 17.09 of the Compute Library, we introduced an important feature to make life easier for developers, and anyone else benchmarking the library: the graph API.

The graph API’s primary function is to reduce the boilerplate code, but it can also reduce errors in your code and improve its readability. It’s simple and easy-to-use, with a stream interface that’s designed to be similar to other C++ objects.

At the current stage, the graph API only supports the ML functions (i.e. convolution, fully connected, activation, pooling...) and can only be used if the library has been compiled with both NEON and OpenCL enabled (neon=1 and opencl=1).

Note: if your platform doesn't have OpenCL don't worry (i.e. Raspberry Pi), the Graph API will automatically fall back onto using NEON, however you do need to compile the Compute Library with both NEON and OpenCL enabled.

In terms of building blocks, the graph API represents the third computation block, together with core and runtime. In terms of hierarchy, the graph API lies just above the runtime, which in turn lies above the core block.

Hierarchy image

Introducing AlexNet

In 2012, AlexNet shot to fame when it won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), an annual challenge that aims to evaluate algorithms for object detection and image classification.

The ILSVRC evaluates the success of image classification solutions is using two important metrics: the top-5 and top-1 errors. Given a set of N images (usually called “test images”) and mapped a target class for each one:

top-1 error checks if the top predicted class is the same as the target class
top-5 error checks if the target class is one of the top five predictions

For both, the top error is calculated as, "the number of times the predicted class does not match the target class, divided by the total number of test images". In other words, a lower score is better.

The authors of AlexNet – Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever of the SuperVision group – achieved a top-5 error around 16%, which was a staggeringly good result back in 2012. To put it into context, until that year no one had been able to go under 20%. AlexNet was also more than 10% better than the runner up.

After 2012, more accurate and deeper CNNs began to proliferate, as the graph below shows

ImageNet Classification, top-5 error image

What are the ingredients of AlexNet?

AlexNet is made up of eight trainable layers: five convolution layers and three fully connected layers. All the trainable layers are followed by a ReLu activation function, except for the last fully connected layer, where the Softmax function is used.

Besides the trainable layers, the network also has:

Three pooling layers
Two normalization layers
One dropout layer (only used for training to reduce the overfitting)

AlexNet table image

Grouping

If you look at the table above, you’ll notice that some convolution layers are actually ‘grouping convolutions’ – an efficient engineering trick that allows the acceleration of the network over two GPUs, without sacrificing accuracy.

If the group size is set to two, the first half of the filters will be connected just to the first half of the input feature maps; the second half will connect to the second half.

AlexNet Convolution

The grouping convolution not only allows you to spread the workload over multiple GPUs, it also reduces the number of MACs needed for the layer by half

A closer look at the code: examples/graph_alexnet.cpp

A C++ implementation of AlexNet using the graph API is proposed in examples/graph_alexnet.cpp.

To run the AlexNet example we need four command line arguments:

./graph_alexnet <target> <cnn_data> <input_image> <labels>

Where:

<target> is the the type of acceleration (NEON=0 or OpenCL=1)
<cnn_data>: Path to cnn_data
<input_image>: Path to your input image (only ppm files are supported)
<labels>: Path to your ImageNet labels

With the following sections I am going to describe the key aspects of this example.

Header files

In order to use the graph API we need to include three header files:

// Contains the definitions for the graph 
#include "arm_compute/graph/Graph.h" 

// Contains the definitions for the nodes (convolution, pooling, fully connected) 
#include "arm_compute/graph/Nodes.h" 

// Contains the utility functions for the graph such as the accessors for the input, trainable and output nodes. The accessors will be presented when we are going to talk about the graph. 
#include "utils/GraphUtils.h"

Mean subtraction pre-processing

A pre-processing stage is needed for preparing the input RGB image before feeding the network, so we’re going to subtract the channel means from each individual colour channel. This operation will centre the red, green and blue channels around the origin.

Pre-processing stage

Where:

r_norm(x,y), g_norm(x,y), b_norm(x,y) are the RGB values at coordinates x,y after the mean subtraction
r(x,y), g(x,y), b(x,y) are the RGB values at coordinates x,y before the mean subtraction
r_mean, g_mean and b_mean are the mean values to use for the RGB channels

For simplicity, we’ve already hard-coded the mean values to use in the example:

constexpr float mean_r = 122.68f; /* Mean value to subtract from red channel */ 
constexpr float mean_g = 116.67f; /* Mean value to subtract from green channel */ 
constexpr float mean_b = 104.01f; /* Mean value to subtract from blue channel */

If you’ve not heard of mean subtraction pre-processing before, have a look at the Compute Image Mean section on the Caffe website

Network description

The body of the network is described through the graph API.

The graph consists of three main parts:

MANDATORY: one input "Tensor object". This layer describes the geometry of the input data along with the data type to use. In this case, we’ll have a 3D input image with shape 227x227x3, using the FP32 data type
The Convolution Neural Network layers – or ‘nodes’ in the graph's terminology – needed for the network
MANDATORY: one output "Tensor object", used to get the result back from the network

As you will notice from the example, the Tensor objects (input and output) and all the trainable layers accept an input function called "accessor".

The accessor is the only way to access the internal Tensors.

The accessor used by the input Tensor object can initialize the input Tensor of the network. This function can also be responsible for the mean subtraction pre-processing and reading the input image from a file or camera
The accessor used by the trainable layers (i.e. convolution, fully connected, etc) can initialize the weights and the biases reading – e.g. the values from a numpy file
The accessor used by the output Tensor object can return the result of the classification, along with the score

If you are curious to know how the accessor works, take a look at utils GraphUtils.h where you can find a few ready-to-use accessors for your Tensor objects and trainable layers.

Time to classify!

Now it is time to turn on your Raspberry Pi and test AlexNet with the same images.

Note: the following steps assume you are in the home directory of your Raspberry Pi or host machine.

On your Raspberry Pi enter the following commands

# Install unzip
sudo apt-get install unzip

# Download the zip file with the AlexNet model, input images and labels
wget <url to archive> 

# Create a new folder
mkdir assets_alexnet

# Unzip
unzip compute_library_alexnet.zip -d assets_alexnet

If you are compiling natively on your Raspberry Pi, use the following instructions. If you’re cross-compiling, see the appropriate section below.

On your Raspberry Pi:

# Clone Compute Library 
git clone https://github.com/Arm-software/ComputeLibrary.git  

# Enter ComputeLibrary folder 
cd ComputeLibrary  

# Native build the library and the examples 
scons Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 build=native –j2

Once the library has been compiled where are ready to classify our go-kart!

export LD_LIBRARY_PATH=build/

PATH_ASSETS=../assets_alexnet

./build/examples/graph_alexnet 0 $PATH_ASSETS $PATH_ASSETS/go_kart.ppm $PATH_ASSETS/labels.txt

If you’re cross-compiling, on your host machine:

# Clone Compute Library 
git clone https://github.com/Arm-software/ComputeLibrary.git  

# Enter ComputeLibrary folder 
cd ComputeLibrary  

# Build the library and the examples 
scons Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 os=linux arch=armv7a -j4 

# Copy the example and dynamic libraries on the Raspberry Pi
scp build/example/graph_alexnet build/libarm_compute.so build/libarm_compute_core.so build/libarm_compute_graph.so <username_raspberrypi>@<ip_addr_raspberrypi>:Desktop

where:

<username_raspberrypi>: username used on your Raspberry Pi
<ip_addr_raspberrypi>: IP address of your Raspberry Pi

Open the SSH session from your host machine:

ssh <username_raspberrypi>@<ip_addr_raspberrypi>

Within the SSH session:

cd Desktop 
  
export LD_LIBRARY_PATH=build/ 

PATH_ASSETS=../assets_alexnet 

./build/examples/graph_alexnet 0 $PATH_ASSETS  $PATH_ASSETS/go_kart.ppm $PATH_ASSETS/labels.txt

Whether or not you’re building the library natively, the output should look like this:

AlexNet Output

And that’s it!

Congratulations – you got there! I hope you had fun and, more importantly, I hope this will help you to develop even more exciting and performant intelligent vision solutions on Arm.

Ciao for now!

Gian Marco

To find this tutorial, and many other resources, visit the Machine Learning Developer Community.

Top Comments

Qiang Han over 3 years ago +1

Hi Gian, I wish to learn about the article mentioned at the beginning, but the link was broken. how to apply a cartoon effect with the Compute Library Could you please update it? Thanks!

Gian Marco Iodice over 2 years ago in reply to xlla

Hi xlla, thanks for trying the example. The inference time looks too high for alexnet on Raspberry Pi. Could you check if you are compiling the library with debug=1? Could you tell me what Arm Compute Library version are you using and the build command? Also, I recommend to open this thread on GitHub in the ACL repo so that the team can help you promptly with this issue. Hope this can help. Gian Marco
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
xlla over 2 years ago

It works but slowly.

I am run it on raspberry pi 4B with 4gb ram, it took about 29s to recognize a gold fish.

<pre>

---------- Top 5 predictions ----------

1.0000 - [id = 1], n01443537 goldfish, Carassius auratus

0.0000 - [id = 27], n01631663 eft

0.0000 - [id = 29], n01632777 axolotl, mud puppy, Ambystoma mexicanum

0.0000 - [id = 124], n01985128 crayfish, crawfish, crawdad, crawdaddy

0.0000 - [id = 310], n02219486 ant, emmet, pismire

Test passed

Can't load libOpenCL.so: libOpenCL.so: cannot open shared object file: No such file or directory

Can't load libGLES_mali.so: libGLES_mali.so: cannot open shared object file: No such file or directory

Can't load libmali.so: libmali.so: cannot open shared object file: No such file or directory

Couldn't find any OpenCL library.

</pre>
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Qiang Han over 3 years ago

Hi Gian,

I wish to learn about the article mentioned at the beginning, but the link was broken.

how to apply a cartoon effect with the Compute Library

Could you please update it?

Thanks!
- Cancel
- Up +1 Down
- Reply
- More
- Cancel
meitiever over 3 years ago

I got this output:
./build/examples/graph_alexnet

Threads : 2
Target : NEON
Data type : F32
Data layout : NHWC
Tuner enabled? : false
Cache enabled? : false
Tuner mode : Normal
Tuner file :
Fast math enabled? : false
Data path : /home/pi/Documents/assets_alexnet
Image file : /home/pi/Documents/assets_alexnet/go_kart.ppm
Labels file : /home/pi/Documents/assets_alexnet/labels.txt

---------- Top 5 predictions ----------

1.0000 - [id = 672], n03792972 mountain tent
1.0000 - [id = 657], n03773504 missile
1.0000 - [id = 658], n03775071 mitten
1.0000 - [id = 659], n03775546 mixing bowl
1.0000 - [id = 660], n03776460 mobile home, manufactured home

Test passed

any idea? and i tested this example on two device and the output is the same.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
taehyun over 4 years ago

Has anyone tried this recently? I am getting wrong results and the program is not working for GC
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Architectures and Processors blog

Part 2: Arm Scalable Matrix Extension (SME) Instructions

Zenon Xiu (修志龙）

This blog is the second half of a two-part blog for SME Instructions. See link to Part 1 in the note at the top of this blog post.
- June 24, 2024
Part 1: Arm Scalable Matrix Extension (SME) Introduction

Zenon Xiu (修志龙）

This blog series provides an introduction to the Arm Scalable Matrix Extension (SME) including SVE and SVE2.
- May 23, 2024
MPAM-Style cache partitioning with ATP-Engine and gem5

Hristo Belchev

Upstream gem5 and ATP-Engine MPAM-style cache partitioning are discussed, with experiments for the feature being proposed and analyzed.
- April 24, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog