Augmented reality (AR) and machine learning (ML) are two leading-edge technologies. AR brings virtual objects into the real world by overlaying digital content above physical objects. ML, on the other hand, helps the program to recognize physical objects of the real world. By combining these two technologies, we can create some innovative projects.
Recently, Arm showcased a demo at Unite Beijing 2018 that combined AR and ML, as well as using a Mali GPU to accelerate the computing tasks. This blog is intended to share our experience in developing our AR and ML demo using Google ARCore and Arm NN in Unity. Below is a video for the end result, and throughout this guide I'll point out the most important steps involved in creating the demo.
You will need a Google ARCore supported device to run our demo. We have tested on a Samsung Galaxy S8, S9 and also Huawei P20, all of which had a Mali GPU inside.
We worked with Arm NN because it can bring >4x performance boost on Arm Cortex-A CPUs and Mali GPUs. See the "Arm NN for Android" section of the Arm NN SDK webpage for more details.
We use Arm NN with YOLO v1 tiny model to do object detection:
Then use Google ARCore to handle the AR parts of the demo:
First of all, you should follow these instructions to prepare your hardware and software environments. Try to build and run the HelloAR sample, and we will start our project from there. For example, you could name your Unity project as "AR Detector".
Change the Orientation in "PlayerSettings > Resolution and Presentation > Default Orientation" from Auto Rotation to Landscape Left for a better experience. And set the "PlayerSetting > Configuration > Scripting Runtime Version" to Experimental (.NET 4.6 Equivalent)
Auto Rotation
Landscape Left
Experimental (.NET 4.6 Equivalent)
Copy the HelloAR scene:
Scenes
Assets
HelloAR
Main
You should get these in your Unity project:
The example scene will visualize the detected plane and add an Andy 3D model when you touch the detected plane on the screen. We need the plane visualizer but not the screen touch function.
Example Controller
HelloARController
Update
Build and run to check whether the touch function has disabled.
You'll need to build Arm NN shared libraries yourself and manually integrate into Unity as native plugins. To do that, you need to create a standalone NDK toolchain. We use armeabi-v7a compiler rather than arm64 because Google ARCore and Unity only support armeabi-v7a right now.
armeabi-v7a
arm64
Read the instructions about "Building Open Source Projects Using Standalone Toolchains" and use this command to create the standalone toolchain:
The Update function of the ARTrackerController script will look like this:
$NDK/build/tools/make_standalone_toolchain.py \ --arch arm \ --api 26 \ --stl=libc++ \ --install-dir=my-toolchain
Then configure and build Arm NN with Caffe parser support by using the newly created standalone toolchain. Read the instructions. Please enable OpenCL option in order to benefit from GPU acceleration. You may also need to build the protobuf shared library for both host and armeabi-v7a versions. And set the -DPROTOBUF_ROOT=/path/to/protobuf/armeabi-v7a_install option while compiling the Arm NN.
-DPROTOBUF_ROOT=/path/to/protobuf/armeabi-v7a_install
After doing that, you should get the "libarmnn.so", "libarmnnCaffeParser.so", and "UnitTests" in the build directory, and you should able to push and run the UnitTests on your Android phone.
libarmnn.so"
libarmnnCaffeParser.so"
UnitTests"
Our demo uses Arm NN to perform object detection. We chose YOLO v1 tiny with COCO datasets pre-trained model which can be downloaded at GitHub. Please download the YOLO CoCo tiny in the "Legacy models" section and push it to your Android device.
adb shell mkdir -p /mnt/sdcard/Android/data/com.yourCompany.ARDetector/files/ adb push coco_tiny.caffemodel /mnt/sdcard/Android/data/com.yourCompany.ARDetector/files/
Since we need to call the C++ API which Arm NN provided, we could use the Native Plugins feature in Unity. See the Unity documentation for more detail. We implemented a shared library named "libyoloDetector.so" and exported two C APIs for Unity to use.
libyoloDetector.so"
The initDetector C API is used to load the machine learning model and initialize the Arm NN network. It should be called when the APP start.
initDetector
// Optimize the network for a specific runtime compute device, e.g. CpuAcc, GpuAcc static armnn::IRuntimePtr s_Runtime = armnn::IRuntime::Create(armnn::Compute::GpuAcc); static armnn::NetworkId s_NetworkIdentifier; static std::pair<armnn::LayerBindingId, armnn::TensorInfo> s_InputBindingInfo; static std::pair<armnn::LayerBindingId, armnn::TensorInfo> s_OutputBindingInfo; static float *s_OutputBuffer; static char k_ModelFileName[] = "/mnt/sdcard/Android/data/com.yourCompany.ARDetector/files/coco_tiny.caffemodel"; static char k_InputTensorName[] = "data"; static char k_OutputTensorName[] = "result"; const unsigned int k_YoloImageWidth = 448; const unsigned int k_YoloImageHeight = 448; const unsigned int k_YoloChannelNums = 3; const unsigned int k_YoloImageBatchSize = 1; const unsigned int k_YoloOutputSize = 7 * 7 * (5 * 3 + 80); extern "C" __attribute__ ((visibility ("default"))) void initDetector() { auto parser = armnnCaffeParser::ICaffeParser::Create(); auto network = parser->CreateNetworkFromBinaryFile( k_ModelFileName, { {k_InputTensorName, {k_YoloImageBatchSize, k_YoloChannelNums, k_YoloImageHeight, k_YoloImageWidth}} }, { k_OutputTensorName }); // Find the binding points for the input and output nodes s_InputBindingInfo = parser->GetNetworkInputBindingInfo(k_InputTensorName); s_OutputBindingInfo = parser->GetNetworkOutputBindingInfo(k_OutputTensorName); armnn::IOptimizedNetworkPtr optNet = armnn::Optimize(*network, s_Runtime->GetDeviceSpec()); // Load the optimized network onto the runtime device armnn::Status ret = s_Runtime->LoadNetwork(s_NetworkIdentifier, std::move(optNet)); if (ret == armnn::Status::Failure) { throw armnn::Exception("IRuntime::LoadNetwork failed"); } s_OutputBuffer = (float*)malloc(sizeof(float) * k_YoloOutputSize); }
CreateNetworkFromBinaryFile( k_ModelFileName, { {k_InputTensorName, {k_YoloImageBatchSize, k_YoloChannelNums, k_YoloImageHeight, k_YoloImageWidth}} }, { k_OutputTensorName }); // Find the binding points for the input and output nodes s_InputBindingInfo = parser->GetNetworkInputBindingInfo(k_InputTensorName); s_OutputBindingInfo = parser->GetNetworkOutputBindingInfo(k_OutputTensorName); armnn::IOptimizedNetworkPtr optNet = armnn::Optimize(*network, s_Runtime->GetDeviceSpec()); // Load the optimized network onto the runtime device armnn::Status ret = s_Runtime->LoadNetwork(s_NetworkIdentifier, std::move(optNet)); if (ret == armnn::Status::Failure) { throw armnn::Exception("IRuntime::LoadNetwork failed"); } s_OutputBuffer = (float*)malloc(sizeof(float) * k_YoloOutputSize); }]]>
And the detectObjects C API is used to detect objects from camera raw data continuously.
detectObjects
// Helper function to make input tensors armnn::InputTensors MakeInputTensors(const std::pair<armnn::LayerBindingId, armnn::TensorInfo>& input, const void* inputTensorData) { return { { input.first, armnn::ConstTensor(input.second, inputTensorData) } }; } // Helper function to make output tensors armnn::OutputTensors MakeOutputTensors(const std::pair<armnn::LayerBindingId, armnn::TensorInfo>& output, void* outputTensorData) { return { { output.first, armnn::Tensor(output.second, outputTensorData) } }; } extern "C" __attribute__ ((visibility ("default"))) int detectObjects(float *inputPtr, float *result) { float *outputPtr = s_OutputBuffer; armnn::Status ret = s_Runtime->EnqueueWorkload(s_NetworkIdentifier, MakeInputTensors(s_InputBindingInfo, inputPtr), MakeOutputTensors(s_OutputBindingInfo, outputPtr)); if (ret == armnn::Status::Failure) { throw armnn::Exception("IRuntime::EnqueueWorkload failed"); }
You may need to implement the ParseOutputTensorsYoloV1 yourself. GitHub has some useful code snippets that may help you to implement the YOLO v1 parser.
ParseOutputTensorsYoloV1
Use the NDK standalone toolchain to compile the above code, and then generate the "libyoloDetector.so" shared library. In order to call from Unity, you should create a folder named "Assets > Plugins > Android" and copy the armeabi-v7a shared libraries to that folder in your Unity project. Here are the libraries I copied:
libarmnn.so
libarmnnCaffeParser.so
libprotobuf.so
libc++_shared.so
libyoloDetector.so
Let's switch back to the Unity project we created. We need camera raw data to feed into the object detection model, but the Google ARCore already took control of the camera. Fortunately, Google ARCore realized other programs may want to access the camera raw data as well. Thus, they have provided an example for this. You can see the "Assets > GoogleARCore > Examples > ComputerVision" example for more detail. We could use the TextureReader script to do the same thing in our example.
TextureReader
You should get this in the end.
Create two C Sharp scripts for this demo.
In the "ArmNNCaffeDetector.cs" script, call the initDetector native function in the constructor.
private static int INPUT_SIZE = 448; private static int RESULT_SIZE = 7 * 7 * 6; private float[] fetchResults = new float[RESULT_SIZE]; [DllImport ("yoloDetector")] private static extern void initDetector(); public ArmNNCaffeDetector() { inPtr = Marshal.AllocHGlobal(3 * INPUT_SIZE * INPUT_SIZE * sizeof(float)); outPtr = Marshal.AllocHGlobal(RESULT_SIZE * sizeof(float)); initDetector(); }
And call the detectObjects native function in the newly created DetectAsync method.
DetectAsync
[DllImport ("yoloDetector")] private static extern int detectObjects(IntPtr input, IntPtr output); public Task<List<KeyValuePair<DetectResults, float>>> DetectAsync(byte[] camImage) { return Task.Run(() => { // Prepare input here ... Marshal.Copy(inputBuffer, 0, inPtr, inputBuffer.Length); int detectObjectNums = detectObjects(inPtr, outPtr); Marshal.Copy(outPtr, fetchResults, 0, RESULT_SIZE); ... // Parse and return the results here }); }
>> DetectAsync(byte[] camImage) { return Task.Run(() => { // Prepare input here ... Marshal.Copy(inputBuffer, 0, inPtr, inputBuffer.Length); int detectObjectNums = detectObjects(inPtr, outPtr); Marshal.Copy(outPtr, fetchResults, 0, RESULT_SIZE); ... // Parse and return the results here }); }]]>
Before the detectObjects call, you may need to convert the camera data format to ArmNN order. Here is the code snippet to do that.
float[] inputBuffer = new float[INPUT_SIZE * INPUT_SIZE * 3]; int h = INPUT_SIZE; int w = INPUT_SIZE; int c = 4; for (int j = 0; j < h; ++j) { for (int i = 0; i < w; ++i) { int r, g, b; r = camImage[j * w * c + i * c + 0]; g = camImage[j * w * c + i * c + 1]; b = camImage[j * w * c + i * c + 2]; // ArmNN order: C, H, W int rDstIndex = 0 * h * w + j * w + i; int gDstIndex = 1 * h * w + j * w + i; int bDstIndex = 2 * h * w + j * w + i; inputBuffer[rDstIndex] = (float)r/255.0f; inputBuffer[gDstIndex] = (float)g/255.0f; inputBuffer[bDstIndex] = (float)b/255.0f; } }
In the "ArmNNCaffeParserController.cs" script, instantiate the ArmNNCaffeDetector class and setup a callback function for TextureReader.
ArmNNCaffeDetector
TextureReader TextureReaderComponent; private ArmNNCaffeDetector detector; private int m_ImageWidth = 0; private int m_ImageHeight = 0; private byte[] m_CamImage = null; private bool m_IsDetecting = false; void Start () { this.detector = new ArmNNCaffeDetector(); TextureReaderComponent = GetComponent<TextureReader> (); // Registers the TextureReader callback. TextureReaderComponent.OnImageAvailableCallback += OnImageAvailable; Screen.sleepTimeout = SleepTimeout.NeverSleep; }
(); // Registers the TextureReader callback. TextureReaderComponent.OnImageAvailableCallback += OnImageAvailable; Screen.sleepTimeout = SleepTimeout.NeverSleep; }]]>
Implement the OnImageAvailable to receive the camera data and call the ArmNNDetect method.
OnImageAvailable
ArmNNDetect
public void OnImageAvailable(TextureReaderApi.ImageFormatType format, int width, int height, IntPtr pixelBuffer, int bufferSize) { if (format != TextureReaderApi.ImageFormatType.ImageFormatColor) { Debug.Log("No object detected due to incorrect image format."); return; } if (m_IsDetecting) { return; } if (m_CamImage == null || m_ImageWidth != width || m_ImageHeight != height) { m_CamImage = new byte[width * height * 4]; m_ImageWidth = width; m_ImageHeight = height; } System.Runtime.InteropServices.Marshal.Copy(pixelBuffer, m_CamImage, 0, bufferSize); m_IsDetecting = true; Invoke(nameof(ArmNNDetect), 0f); }
Call the DetectAsync method in the ArmNNDetect method.
private async void ArmNNDetect() { var probabilities_and_bouding_boxes = await this.detector.DetectAsync (m_CamImage); ... // Visualize the bounding boxes and probabilities to the screen // Use "Frame.Raycast" which ARCore provided to find the 3D pose of the detected objects. // And render a related virtual object at the pose. }
The DetectAsync method will return the probabilities and bounding boxes for the detected objects. After that, do whatever you want to do there. e.g. visualize the bounding boxes and push some virtual content nearby the physical objects.
How to use the "Frame.Raycast" function to get the 3D pose of the detected objects? Do you remember the code you comment out in the Update method of the "HelloARController.cs" script? You can refer to that code, and use the bounding box coordinate instead of the touch point coordinate.
HelloARController.cs"
// Raycast against the location the object detected to search for planes. TrackableHit hit; TrackableHitFlags raycastFilter = TrackableHitFlags.PlaneWithinPolygon | TrackableHitFlags.FeaturePointWithSurfaceNormal; if (Frame.Raycast(boundingbox.position.x, boundingbox.position.y, raycastFilter, out hit)) { var andyObject = Instantiate(AndyAndroidPrefab, hit.Pose.position, hit.Pose.rotation); // Create an anchor to allow ARCore to track the hitpoint as understanding of the physical // world evolves. var anchor = hit.Trackable.CreateAnchor(hit.Pose); // Andy should look at the camera but still be flush with the plane. if ((hit.Flags & TrackableHitFlags.PlaneWithinPolygon) != TrackableHitFlags.None) { // Get the camera position and match the y-component with the hit position. Vector3 cameraPositionSameY = FirstPersonCamera.transform.position; cameraPositionSameY.y = hit.Pose.position.y; // Have Andy look toward the camera respecting his "up" perspective, which may be from ceiling. andyObject.transform.LookAt(cameraPositionSameY, andyObject.transform.up); } // Make Andy model a child of the anchor. andyObject.transform.parent = anchor.transform; }
There we have it. You should have your own AR/ML demo up and running! Did you do something differently? Why not share it with us in the comments?
Arm NN SDK is a free of charge set of open-source Linux software and tools that enables machine learning workloads on power-efficient devices. It provides a bridge between existing neural network frameworks and power-efficient Arm Cortex CPUs, Arm Mali GPUs or the Arm Machine Learning processor.
[CTAToken URL = "https://developer.arm.com/products/processors/machine-learning/arm-nn" target="_blank" text="Learn more about Arm NN SDK" class ="green"]
Hello, The above tutorial does not work in todays time as the technologies have been improved, if anyone have a similar project or is working on one i will be happy if he shares it with us
Hi, I'm using the ArmNN framework from release 19.02 to perform a TensorFlow-based object detection similar to the one presented in this blog post. I'm testing in a Samsung S9+ with the Exynos 9810 SOC which has a Mali-G72 GPU. Even though I specify a GpuAcc backend, as can be seen in the code snippet
below, the network processing is only performed on the CPU. May you help me to figure out what I'm missing? I also tried to follow the code of this demo but it uses the ArmNN API defined in the version 18.05 or previous, which I'm not able to compile since it shows several compilation errors... Moreover, do you have any file/example that tests the GPU acceleration support?Thank you in advance,Joana
I think my code is outdated. Please try to refer the the latest ArmNN sample code.
https://github.com/ARM-software/armnn/blob/master/samples/SimpleSample.cpp
Hi Emrys,
To keep the blog briefly, I didn't show the header files I included.
You should able to found them in the install folder of the ArmNN build.
Here are some of the headers I included.
#include "armnn/ArmNN.hpp" #include "armnn/Exceptions.hpp" #include "armnn/Tensor.hpp" #include "armnn/INetwork.hpp" #include "armnnCaffeParser/ICaffeParser.hpp"
Before we fix the format of the code snippets issue, you could refer to the Chinese version of this blog.
https://community.arm.com/cn/b/blog/posts/ar-meet-ml-cn-2018
For the "ParseOutputTensorsYoloV1", you cloud learn more about the output of the YOLOv1 from here
https://pjreddie.com/darknet/yolov1/
https://arxiv.org/pdf/1506.02640.pdf
The purpose of this function is to convert the output from the neural network into the detected results, which are:
What are the top N detections?
Which class we predict for each detection?
What are the confident value and bonding boxes of each detection?
For me, I packed the above info into a flatten float array (the "float *result"), in order to show the bounding boxes in Java code.
As I mentioned in the blog, you could refer to this code snippet: https://github.com/ARM-software/armnn/blob/master/tests/YoloInferenceTest.hpp#L33
All you need to do is change some data structure, change the "numClasses" to 80 and change the "numScales" to 3.
Joel
If you could upload your completed project on github or something similar it would help a great deal.