AR and ML using Google ARCore, Arm NN in Unity running on mobile devices

July 17, 2018

14 minute read time.

Introduction

Augmented reality (AR) and machine learning (ML) are two leading-edge technologies. AR brings virtual objects into the real world by overlaying digital content above physical objects. ML, on the other hand, helps the program to recognize physical objects of the real world. By combining these two technologies, we can create some innovative projects.

Recently, Arm showcased a demo at Unite Beijing 2018 that combined AR and ML, as well as using a Mali GPU to accelerate the computing tasks. This blog is intended to share our experience in developing our AR and ML demo using Google ARCore and Arm NN in Unity. Below is a video for the end result, and throughout this guide I'll point out the most important steps involved in creating the demo.

You will need a Google ARCore supported device to run our demo. We have tested on a Samsung Galaxy S8, S9 and also Huawei P20, all of which had a Mali GPU inside.

We worked with Arm NN because it can bring >4x performance boost on Arm Cortex-A CPUs and Mali GPUs. See the "Arm NN for Android" section of the Arm NN SDK webpage for more details.

We use Arm NN with YOLO v1 tiny model to do object detection:

Detect physical objects: Use deep learning neural network to detect physical objects from camera raw input.
Classification: The ML model pre-trained with COCO datasets. It can recognize 80 object categories.
Localization: Put a bounding box around the detected object. Localized it in 2D space.

Then use Google ARCore to handle the AR parts of the demo:

Plane detection: Detect horizontal plane, localize the physical object in 3D by ray cast from 2D to the detected plane.
Motion tracking: Track the camera movement.
Anchors: Anchor the specific pose in the real world and keep the virtual content at the same location.

Enable Google ARCore in Unity

First of all, you should follow these instructions to prepare your hardware and software environments. Try to build and run the HelloAR sample, and we will start our project from there. For example, you could name your Unity project as "AR Detector".

Change the Orientation in "PlayerSettings > Resolution and Presentation > Default Orientation" from Auto Rotation to Landscape Left for a better experience. And set the "PlayerSetting > Configuration > Scripting Runtime Version" to Experimental (.NET 4.6 Equivalent)

Copy the HelloAR scene:

Create a Scenes folder in Assets.
Duplicate the HelloAR scene in the "Assets > GoogleARCore > Examples > HelloAR > Scenes" folder.
Move the copy of the HelloAR scene to "Assets > Scenes" and rename it to Main.
Double-click to open the "Assets > Scenes > Main" scene.

You should get these in your Unity project:

Unity Main Scene

The example scene will visualize the detected plane and add an Andy 3D model when you touch the detected plane on the screen. We need the plane visualizer but not the screen touch function.

Click on the Example Controller in the Hierarchy tab.
Double-click the HelloARController in the "Inspector > Hello AR Controller (Script) > Script" component to open the C Sharp script.
Inside the Update method, comment out everything after the comment "If the player has not touched the screen, we are done with this update."

Build and run to check whether the touch function has disabled.

Build Arm NN shared libraries

You'll need to build Arm NN shared libraries yourself and manually integrate into Unity as native plugins. To do that, you need to create a standalone NDK toolchain. We use armeabi-v7a compiler rather than arm64 because Google ARCore and Unity only support armeabi-v7a right now.

Read the instructions about "Building Open Source Projects Using Standalone Toolchains" and use this command to create the standalone toolchain:

The Update function of the ARTrackerController script will look like this:

$NDK/build/tools/make_standalone_toolchain.py \
  --arch arm \
  --api 26 \
  --stl=libc++ \
  --install-dir=my-toolchain

Then configure and build Arm NN with Caffe parser support by using the newly created standalone toolchain. Read the instructions. Please enable OpenCL option in order to benefit from GPU acceleration. You may also need to build the protobuf shared library for both host and armeabi-v7a versions. And set the -DPROTOBUF_ROOT=/path/to/protobuf/armeabi-v7a_install option while compiling the Arm NN.

After doing that, you should get the "libarmnn.so", "libarmnnCaffeParser.so", and "UnitTests" in the build directory, and you should able to push and run the UnitTests on your Android phone.

A CPP Object Detector as a Native Plugin for Unity

Our demo uses Arm NN to perform object detection. We chose YOLO v1 tiny with COCO datasets pre-trained model which can be downloaded at GitHub. Please download the YOLO CoCo tiny in the "Legacy models" section and push it to your Android device.

adb shell mkdir -p /mnt/sdcard/Android/data/com.yourCompany.ARDetector/files/
adb push coco_tiny.caffemodel /mnt/sdcard/Android/data/com.yourCompany.ARDetector/files/

Since we need to call the C++ API which Arm NN provided, we could use the Native Plugins feature in Unity. See the Unity documentation for more detail. We implemented a shared library named "libyoloDetector.so" and exported two C APIs for Unity to use.

The initDetector C API is used to load the machine learning model and initialize the Arm NN network. It should be called when the APP start.

s_InputBindingInfo; static std::pair s_OutputBindingInfo; static float *s_OutputBuffer; static char k_ModelFileName[] = "/mnt/sdcard/Android/data/com.yourCompany.ARDetector/files/coco_tiny.caffemodel"; static char k_InputTensorName[] = "data"; static char k_OutputTensorName[]]>

// Optimize the network for a specific runtime compute device, e.g. CpuAcc, GpuAcc
static armnn::IRuntimePtr s_Runtime = armnn::IRuntime::Create(armnn::Compute::GpuAcc);
static armnn::NetworkId s_NetworkIdentifier;

static std::pair<armnn::LayerBindingId, armnn::TensorInfo> s_InputBindingInfo;
static std::pair<armnn::LayerBindingId, armnn::TensorInfo> s_OutputBindingInfo;

static float *s_OutputBuffer;
static char k_ModelFileName[] = "/mnt/sdcard/Android/data/com.yourCompany.ARDetector/files/coco_tiny.caffemodel";
static char k_InputTensorName[] = "data";
static char k_OutputTensorName[] = "result";
const unsigned int k_YoloImageWidth = 448;
const unsigned int k_YoloImageHeight = 448;
const unsigned int k_YoloChannelNums = 3;
const unsigned int k_YoloImageBatchSize = 1;
const unsigned int k_YoloOutputSize = 7 * 7 * (5 * 3 + 80);

extern "C" __attribute__ ((visibility ("default")))
void initDetector()
{
 auto parser = armnnCaffeParser::ICaffeParser::Create();

 auto network = parser->CreateNetworkFromBinaryFile(
 k_ModelFileName,
 { {k_InputTensorName, {k_YoloImageBatchSize, k_YoloChannelNums, k_YoloImageHeight, k_YoloImageWidth}} },
 { k_OutputTensorName });

 // Find the binding points for the input and output nodes
 s_InputBindingInfo = parser->GetNetworkInputBindingInfo(k_InputTensorName);
 s_OutputBindingInfo = parser->GetNetworkOutputBindingInfo(k_OutputTensorName);

 armnn::IOptimizedNetworkPtr optNet =
 armnn::Optimize(*network, s_Runtime->GetDeviceSpec());

 // Load the optimized network onto the runtime device
 armnn::Status ret = s_Runtime->LoadNetwork(s_NetworkIdentifier, std::move(optNet));
 if (ret == armnn::Status::Failure)
 {
 throw armnn::Exception("IRuntime::LoadNetwork failed");
 }

 s_OutputBuffer = (float*)malloc(sizeof(float) * k_YoloOutputSize);
}

CreateNetworkFromBinaryFile( k_ModelFileName, { {k_InputTensorName, {k_YoloImageBatchSize, k_YoloChannelNums, k_YoloImageHeight, k_YoloImageWidth}} }, { k_OutputTensorName }); // Find the binding points for the input and output nodes s_InputBindingInfo = parser->GetNetworkInputBindingInfo(k_InputTensorName); s_OutputBindingInfo = parser->GetNetworkOutputBindingInfo(k_OutputTensorName); armnn::IOptimizedNetworkPtr optNet = armnn::Optimize(*network, s_Runtime->GetDeviceSpec()); // Load the optimized network onto the runtime device armnn::Status ret = s_Runtime->LoadNetwork(s_NetworkIdentifier, std::move(optNet)); if (ret == armnn::Status::Failure) { throw armnn::Exception("IRuntime::LoadNetwork failed"); } s_OutputBuffer = (float*)malloc(sizeof(float) * k_YoloOutputSize); }]]>

And the detectObjects C API is used to detect objects from camera raw data continuously.

& input, const void* inputTensorData) { return { { input.first, armnn::ConstTensor(input.second, inputTensorData) } }; } // Helper function to make output tensors armnn::OutputTensors MakeOutputTensors(const std::pair& output, void* outputTensorData) { return { { output.first, armnn::Tensor(output.second, outputTensorData) } }; } extern "C" __attribute__ ((visibility ("default"))) int detectObjects(float *inputPtr, float *result) { float *outputPtr = s_OutputBuffer; armnn::Status ret = s_Runtime->EnqueueWorkload(s_NetworkIdentifier, MakeInputTensors(s_InputBindingInfo, inputPtr), MakeOutputTensors(s_OutputBindingInfo, outputPtr)); if (ret == armnn::Status::Failure) { throw armnn::Exception("IRuntime::EnqueueWorkload failed"); } return ParseOutputTensorsYoloV1(outputPtr, result); }]]>

// Helper function to make input tensors
armnn::InputTensors MakeInputTensors(const std::pair<armnn::LayerBindingId,
 armnn::TensorInfo>& input,
 const void* inputTensorData)
{
 return { { input.first, armnn::ConstTensor(input.second, inputTensorData) } };
}
 
// Helper function to make output tensors
armnn::OutputTensors MakeOutputTensors(const std::pair<armnn::LayerBindingId,
 armnn::TensorInfo>& output,
 void* outputTensorData)
{
 return { { output.first, armnn::Tensor(output.second, outputTensorData) } };
}
 
extern "C" __attribute__ ((visibility ("default")))
int detectObjects(float *inputPtr, float *result)
{
 float *outputPtr = s_OutputBuffer;
 armnn::Status ret = s_Runtime->EnqueueWorkload(s_NetworkIdentifier,
 MakeInputTensors(s_InputBindingInfo, inputPtr),
 MakeOutputTensors(s_OutputBindingInfo, outputPtr));
 if (ret == armnn::Status::Failure)
 {
 throw armnn::Exception("IRuntime::EnqueueWorkload failed");
 }

You may need to implement the ParseOutputTensorsYoloV1 yourself. GitHub has some useful code snippets that may help you to implement the YOLO v1 parser.

Use the NDK standalone toolchain to compile the above code, and then generate the "libyoloDetector.so" shared library. In order to call from Unity, you should create a folder named "Assets > Plugins > Android" and copy the armeabi-v7a shared libraries to that folder in your Unity project. Here are the libraries I copied:

libarmnn.so
libarmnnCaffeParser.so
libprotobuf.so
libc++_shared.so
libyoloDetector.so

Integrate Arm NN into Unity

Let's switch back to the Unity project we created. We need camera raw data to feed into the object detection model, but the Google ARCore already took control of the camera. Fortunately, Google ARCore realized other programs may want to access the camera raw data as well. Thus, they have provided an example for this. You can see the "Assets > GoogleARCore > Examples > ComputerVision" example for more detail. We could use the TextureReader script to do the same thing in our example.

Select the Main scene in the Hierarchy tab.
Select "GameObject > Create Empty" to create an empty game object.
Rename the empty game object to "ArmNNCaffeParserController"
Click the "Add Component" button in the inspector.
Search "Texture Reader" and add it to the "ArmNNCaffeParserController".
Change the Image Width and Height to 448.
Change "Image Sample Mode" to "Keep Aspect Ratio"
Change "Image Format" to "Image Format Color"

You should get this in the end.

Tecture render scene

Create two C Sharp scripts for this demo.

Create a "Assets > Scripts" directory.
Create two C Sharp scripts named "ArmNNCaffeParserController.cs" and "ArmNNCaffeDetector.cs".
Select "ArmNNCaffeParserController" in the "Main" scene.
Use "Add Component", search for "Arm NN Caffe Parser Controller" and add it to the controller.

In the "ArmNNCaffeDetector.cs" script, call the initDetector native function in the constructor.

private static int INPUT_SIZE = 448;
private static int RESULT_SIZE = 7 * 7 * 6;
private float[] fetchResults = new float[RESULT_SIZE];
 
[DllImport ("yoloDetector")]
private static extern void initDetector();
 
public ArmNNCaffeDetector()
{
 inPtr = Marshal.AllocHGlobal(3 * INPUT_SIZE * INPUT_SIZE * sizeof(float));
 outPtr = Marshal.AllocHGlobal(RESULT_SIZE * sizeof(float));
 initDetector();
}

And call the detectObjects native function in the newly created DetectAsync method.

[DllImport ("yoloDetector")]
private static extern int detectObjects(IntPtr input, IntPtr output);
 
public Task<List<KeyValuePair<DetectResults, float>>> DetectAsync(byte[] camImage)
{
 return Task.Run(() =>
 {
 // Prepare input here
 ...
 Marshal.Copy(inputBuffer, 0, inPtr, inputBuffer.Length);
 int detectObjectNums = detectObjects(inPtr, outPtr);
 Marshal.Copy(outPtr, fetchResults, 0, RESULT_SIZE);
 ...
 // Parse and return the results here
 });
}

>> DetectAsync(byte[] camImage) { return Task.Run(() => { // Prepare input here ... Marshal.Copy(inputBuffer, 0, inPtr, inputBuffer.Length); int detectObjectNums = detectObjects(inPtr, outPtr); Marshal.Copy(outPtr, fetchResults, 0, RESULT_SIZE); ... // Parse and return the results here }); }]]>

Before the detectObjects call, you may need to convert the camera data format to ArmNN order. Here is the code snippet to do that.

float[] inputBuffer = new float[INPUT_SIZE * INPUT_SIZE * 3];
int h = INPUT_SIZE;
int w = INPUT_SIZE;
int c = 4;
for (int j = 0; j < h; ++j)
{
 for (int i = 0; i < w; ++i)
 {
 int r, g, b;
 r = camImage[j * w * c + i * c + 0];
 g = camImage[j * w * c + i * c + 1];
 b = camImage[j * w * c + i * c + 2];
 
 // ArmNN order: C, H, W
 int rDstIndex = 0 * h * w + j * w + i;
 int gDstIndex = 1 * h * w + j * w + i;
 int bDstIndex = 2 * h * w + j * w + i;
 
 inputBuffer[rDstIndex] = (float)r/255.0f;
 inputBuffer[gDstIndex] = (float)g/255.0f;
 inputBuffer[bDstIndex] = (float)b/255.0f;
 }
}

In the "ArmNNCaffeParserController.cs" script, instantiate the ArmNNCaffeDetector class and setup a callback function for TextureReader.

TextureReader TextureReaderComponent;
private ArmNNCaffeDetector detector;
 
private int m_ImageWidth = 0;
private int m_ImageHeight = 0;
private byte[] m_CamImage = null;
 
private bool m_IsDetecting = false;
 
void Start () {
 this.detector = new ArmNNCaffeDetector();
 
 TextureReaderComponent = GetComponent<TextureReader> ();
 
 // Registers the TextureReader callback.
 TextureReaderComponent.OnImageAvailableCallback += OnImageAvailable;
 Screen.sleepTimeout = SleepTimeout.NeverSleep;
}

(); // Registers the TextureReader callback. TextureReaderComponent.OnImageAvailableCallback += OnImageAvailable; Screen.sleepTimeout = SleepTimeout.NeverSleep; }]]>

Implement the OnImageAvailable to receive the camera data and call the ArmNNDetect method.

public void OnImageAvailable(TextureReaderApi.ImageFormatType format, int width, int height, IntPtr pixelBuffer, int bufferSize)
{
 if (format != TextureReaderApi.ImageFormatType.ImageFormatColor)
 {
 Debug.Log("No object detected due to incorrect image format.");
 return;
 }
 
 if (m_IsDetecting) {
 return;
 }
 
 if (m_CamImage == null || m_ImageWidth != width || m_ImageHeight != height)
 {
 m_CamImage = new byte[width * height * 4];
 m_ImageWidth = width;
 m_ImageHeight = height;
 }
 System.Runtime.InteropServices.Marshal.Copy(pixelBuffer, m_CamImage, 0, bufferSize);
 
 m_IsDetecting = true;
 Invoke(nameof(ArmNNDetect), 0f);
}

Call the DetectAsync method in the ArmNNDetect method.

private async void ArmNNDetect()
{
 var probabilities_and_bouding_boxes = await this.detector.DetectAsync (m_CamImage);
 ...
 // Visualize the bounding boxes and probabilities to the screen
 // Use "Frame.Raycast" which ARCore provided to find the 3D pose of the detected objects.
 // And render a related virtual object at the pose.
}

The DetectAsync method will return the probabilities and bounding boxes for the detected objects. After that, do whatever you want to do there. e.g. visualize the bounding boxes and push some virtual content nearby the physical objects.

How to use the "Frame.Raycast" function to get the 3D pose of the detected objects? Do you remember the code you comment out in the Update method of the "HelloARController.cs" script? You can refer to that code, and use the bounding box coordinate instead of the touch point coordinate.

// Raycast against the location the object detected to search for planes.
TrackableHit hit;
TrackableHitFlags raycastFilter = TrackableHitFlags.PlaneWithinPolygon |
 TrackableHitFlags.FeaturePointWithSurfaceNormal;
 
if (Frame.Raycast(boundingbox.position.x, boundingbox.position.y, raycastFilter, out hit))
{
 var andyObject = Instantiate(AndyAndroidPrefab, hit.Pose.position, hit.Pose.rotation);
 
 // Create an anchor to allow ARCore to track the hitpoint as understanding of the physical
 // world evolves.
 var anchor = hit.Trackable.CreateAnchor(hit.Pose);
 
 // Andy should look at the camera but still be flush with the plane.
 if ((hit.Flags & TrackableHitFlags.PlaneWithinPolygon) != TrackableHitFlags.None)
 {
 // Get the camera position and match the y-component with the hit position.
 Vector3 cameraPositionSameY = FirstPersonCamera.transform.position;
 cameraPositionSameY.y = hit.Pose.position.y;
 
 // Have Andy look toward the camera respecting his "up" perspective, which may be from ceiling.
 andyObject.transform.LookAt(cameraPositionSameY, andyObject.transform.up);
 }
 
 // Make Andy model a child of the anchor.
 andyObject.transform.parent = anchor.transform;
}

There we have it. You should have your own AR/ML demo up and running! Did you do something differently? Why not share it with us in the comments?

Arm NN SDK is a free of charge set of open-source Linux software and tools that enables machine learning workloads on power-efficient devices. It provides a bridge between existing neural network frameworks and power-efficient Arm Cortex CPUs, Arm Mali GPUs or the Arm Machine Learning processor.

[CTAToken URL = "https://developer.arm.com/products/processors/machine-learning/arm-nn" target="_blank" text="Learn more about Arm NN SDK" class ="green"]

Jarar over 1 year ago

Hello, The above tutorial does not work in todays time as the technologies have been improved, if anyone have a similar project or is working on one i will be happy if he shares it with us
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Joana Catarino over 4 years ago

Hi,

I'm using the ArmNN framework from release 19.02 to perform a TensorFlow-based object detection similar to the one presented in this blog post. I'm testing in a Samsung S9+ with the Exynos 9810 SOC which has a Mali-G72 GPU. Even though I specify a GpuAcc backend, as can be seen in the code snippet

below, the network processing is only performed on the CPU. May you help me to figure out what I'm missing? I also tried to follow the code of this demo but it uses the ArmNN API defined in the version 18.05 or previous, which I'm not able to compile since it shows several compilation errors... Moreover, do you have any file/example that tests the GPU acceleration support?

Thank you in advance,
Joana
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Joel Liang over 4 years ago in reply to Emrys

I think my code is outdated. Please try to refer the the latest ArmNN sample code.

https://github.com/ARM-software/armnn/blob/master/samples/SimpleSample.cpp
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Joel Liang over 4 years ago in reply to Emrys

Hi Emrys,

To keep the blog briefly, I didn't show the header files I included.

You should able to found them in the install folder of the ArmNN build.

Here are some of the headers I included.

#include "armnn/ArmNN.hpp"
#include "armnn/Exceptions.hpp"
#include "armnn/Tensor.hpp"
#include "armnn/INetwork.hpp"
#include "armnnCaffeParser/ICaffeParser.hpp"

Before we fix the format of the code snippets issue, you could refer to the Chinese version of this blog.

  https://community.arm.com/cn/b/blog/posts/ar-meet-ml-cn-2018

For the "ParseOutputTensorsYoloV1", you cloud learn more about the output of the YOLOv1 from here

  https://pjreddie.com/darknet/yolov1/

  https://arxiv.org/pdf/1506.02640.pdf

The purpose of this function is to convert the output from the neural network into the detected results, which are:

  What are the top N detections?

Which class we predict for each detection?

What are the confident value and bonding boxes of each detection?

For me, I packed the above info into a flatten float array (the "float *result"), in order to show the bounding boxes in Java code.

As I mentioned in the blog, you could refer to this code snippet: https://github.com/ARM-software/armnn/blob/master/tests/YoloInferenceTest.hpp#L33

All you need to do is change some data structure, change the "numClasses" to 80 and change the "numScales" to 3.

Joel
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Emrys over 4 years ago in reply to Emrys

If you could upload your completed project on github or something similar it would help a great deal.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Graphics, Gaming, and VR blog

Introducing Arm Accuracy Super Resolution

arm-phodges

Today we introduce “Arm Accuracy Super Resolution” (Arm ASR), which is a best-in-class open-source solution for upscaling on mobile devices.
- July 10, 2024
Getting started with Android Dynamic Performance Framework (ADPF) in Unreal Engine

Syed Farhan Hassan

For research purposes, Arm has developed a demo using Unreal Engine and Android Dynamic Performance Framework (ADPF) to investigate how ADPF is used to optimize game performance.
- July 4, 2024
NanoMesh on Mobile: Delivering great beauty in simplicity

Nathan Li

From the GDC24 tech talk “SmartGI Evolution: Adaptive NanoMesh on Mobile”. SmartGI and NanoMesh are cutting-edge rendering solutions aiming to enable the best possible graphics on all platforms.
- May 28, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

AR and ML using Google ARCore, Arm NN in Unity running on mobile devices

Introduction

Enable Google ARCore in Unity

Build Arm NN shared libraries

A CPP Object Detector as a Native Plugin for Unity

Integrate Arm NN into Unity

Introducing Arm Accuracy Super Resolution

Getting started with Android Dynamic Performance Framework (ADPF) in Unreal Engine

NanoMesh on Mobile: Delivering great beauty in simplicity