当增强现实(AR)遇到机器学习(ML)，一个结合 Google ARCore 与 Arm NN 的 Unity 案例

Song Bin 宋斌

July 13, 2018

6 minute read time.

背景介绍
在 Unity 中打开 Google ARCore 功能
编译 Arm NN 共享库
作为 Unity 的 Native 插件的一个 C++ 物体检测器
整合 Arm NN 到 Unity 中

背景介绍

增强现实和机器学习是当前两项最前沿的技术。其中增强现实可以把虚拟的物体惟妙惟肖地渲染到真实世界当中。

而机器学习，则可以帮助程序更好的认识真实世界中的物体。同时运用这两项技术，我们将可以创造出一系列新颖的应用。

最近, Arm 在 Unity Unite Beijing 2018 中展示了同时结合 AR 和 ML 的演示程序。并且使用了 Mali GPU 加速其中的运算任务。本博客的目的就是分享我们在开发这个结合 AR 和 ML 的演示程序的一些经验。该演示程序在 Unity 中运用了 Google ARCore 以及 Arm NN (Arm神经网络开发包)。我将为你展示创建这个演示程序的关键步骤。

你需要一台支持 Google ARCore 的设备来运行我们的演示程序。我们已经测试过内嵌 Mali GPU 的三星 Galaxy S8/S9 以及华为 P20 手机。

我们选择 Arm NN 是因为它能在 Cortex-A 系列 CPU 以及 Mali GPU 上获得超过四倍的性能提升。具体参见 Arm NN SDK 页面中的 "Arm NN for Android" 小节。

我们在 Arm NN 中使用 YOLO v1 tiny 神经网络模型，用于物体检测：

检测物体: 使用深度学习神经网络，从摄像头输入数据中检测物理对象。
分类: 这个机器学习模型已经使用 COCO 数据集预训练过。它能识别80个物体类别。
定位: 在被检测到的物体周围画上边界框。在二维空间中定位物体。

使用 Google ARCore 处理演示程序的 AR 部分功能：

平面检测: 检测水平平面，并通过射线投影被识别出来的物体的 2D 位置到检测到的平面以获得其 3D 空间的位置。
运动追踪: 追踪摄像头的运动。
锚点: 通过锚点固定真实世界中的特定位置，使得渲染的虚拟物体固定在同一地方。

在 Unity 中打开 Google ARCore 功能

首先，你应该跟着这篇快速入门去准备好你的软硬件环境。尝试去编译和运行 HelloAR 范例。我们将以此为基础开始我们的项目。你可以将你的 Unity 项目命名为 "AR Detector"。

为了更好的用户体验，修改 "PlayerSettings > Resolution and Presentation > Default Orientation" 选项，把 Auto Rotation 修改为 Landscape Left。并且设置 "PlayerSetting > Configuration > Scripting Runtime Version" 为 Experimental (.NET 4.6 Equivalent)。

拷贝 HelloAR 场景(scene):

在 Assets 中创建 Scenes 子目录。
在 "Assets > GoogleARCore > Examples > HelloAR > Scenes" 目录中复制(Duplicate) HelloAR 场景。
移动 HelloAR 场景的副本到 "Assets > Scenes" 并改名为 Main。
双击打开 "Assets > Scenes > Main" 场景。

完成以上操作后，你在 Unity 项目中的状态将会是：

这个示例场景会可视化检测到的水平平面并且在手指触碰到屏幕上被检测到的平面时，添加一个小机器人到平面上。我们需要可视化的平面，但不需要其屏幕触膜功能。

在 Hierarchy 选项卡中单击 Example Controller。
在 "Inspector > Hello AR Controller (Script) > Script" 组件内双击 HelloARController 打开 C Sharp 脚本。
在 Update 方法内部，注释掉 "If the player has not touched the screen, we are done with this update." 后面的所有代码。

编译并运行，以检查屏幕触摸功能是否已失效。

编译 Arm NN 共享库

你将需要自己手动编译 Arm NN 共享库，并作为 Native 插件集成到 Unity 项目中。要完成编译，你需要创建一个 NDK 独立工具链。我们将使用 armeabi-v7a 编译器，而不是 arm64，因为 Google ARCore 和 Unity 目前只支持 armeabi-v7a。

阅读这里(英文版本) 关于 "Building Open Source Projects Using Standalone Toolchains" 的指南。并且使用这条命令去创建 NDK 独立工具链：

$NDK/build/tools/make_standalone_toolchain.py \
  --arch arm \
  --api 26 \
  --stl=libc++ \
  --install-dir=my-toolchain

然后，使用新创建的独立工具链配置并编译支持 Caffe parser 的 Arm NN。编译指南在这里。请使能 OpenCL 选项，以便使用 GPU 加速。你可能还需要编译 host 及 armeabi-v7a 两个版本的 protobuf 共享库。并且在你编译 Arm NN 时设置 -DPROTOBUF_ROOT=/path/to/protobuf/armeabi-v7a_install 选项。

完成以上操作后，你将在 build 目录中得到 "libarmnn.so", "libarmnnCaffeParser.so", 以及 "UnitTests" 文件。并且，你应该能把 UnitTests 推送到你的 Android 手机上执行。

作为 Unity 的 Native 插件的一个 C++ 物体检测器

我们的演示程序使用 Arm NN 去完成物体检测。你可以在这里下载我们选用的使用 COCO 数据集预训练好的 YOLO v1 tiny 机器学习模型。请在 "Legacy models" 小节里下载 YOLO CoCo tiny。并且推送到你的 Android 设备中。

adb shell mkdir -p /mnt/sdcard/Android/data/com.yourCompany.ARDetector/files/
adb push coco_tiny.caffemodel /mnt/sdcard/Android/data/com.yourCompany.ARDetector/files/

因为我们需要调用 Arm NN 提供的 C++ API，我们可以使用 Unity 的 Native 插件功能。关于 Native 插件的详细说明请看这里。我们实现了一个名为 "libyoloDetector.so" 的共享库，并暴露了两个 C APIs 给 Unity 使用。

其中 initDetector C API 是用于加载机器学习模型并初始化 Arm NN 神经网络。它应该在应用启动时被调用。

// Optimize the network for a specific runtime compute device, e.g. CpuAcc, GpuAcc
static armnn::IRuntimePtr s_Runtime = armnn::IRuntime::Create(armnn::Compute::GpuAcc);
static armnn::NetworkId s_NetworkIdentifier;

static std::pair<armnn::LayerBindingId, armnn::TensorInfo> s_InputBindingInfo;
static std::pair<armnn::LayerBindingId, armnn::TensorInfo> s_OutputBindingInfo;

static float *s_OutputBuffer;
static char k_ModelFileName[] = "/mnt/sdcard/Android/data/com.yourCompany.ARDetector/files/coco_tiny.caffemodel";
static char k_InputTensorName[] = "data";
static char k_OutputTensorName[] = "result";
const unsigned int k_YoloImageWidth = 448;
const unsigned int k_YoloImageHeight = 448;
const unsigned int k_YoloChannelNums = 3;
const unsigned int k_YoloImageBatchSize = 1;
const unsigned int k_YoloOutputSize = 7 * 7 * (5 * 3 + 80);

extern "C" __attribute__ ((visibility ("default")))
void initDetector()
{
 auto parser = armnnCaffeParser::ICaffeParser::Create();

 auto network = parser->CreateNetworkFromBinaryFile(
 k_ModelFileName,
 { {k_InputTensorName, {k_YoloImageBatchSize, k_YoloChannelNums, k_YoloImageHeight, k_YoloImageWidth}} },
 { k_OutputTensorName });

 // Find the binding points for the input and output nodes
 s_InputBindingInfo = parser->GetNetworkInputBindingInfo(k_InputTensorName);
 s_OutputBindingInfo = parser->GetNetworkOutputBindingInfo(k_OutputTensorName);

 armnn::IOptimizedNetworkPtr optNet =
 armnn::Optimize(*network, s_Runtime->GetDeviceSpec());

 // Load the optimized network onto the runtime device
 armnn::Status ret = s_Runtime->LoadNetwork(s_NetworkIdentifier, std::move(optNet));
 if (ret == armnn::Status::Failure)
 {
 throw armnn::Exception("IRuntime::LoadNetwork failed");
 }

 s_OutputBuffer = (float*)malloc(sizeof(float) * k_YoloOutputSize);
}

而另一个 detectObjects C API 则用于接收摄像头数据并进行持续的物体检测功能。

// Helper function to make input tensors
armnn::InputTensors MakeInputTensors(const std::pair<armnn::LayerBindingId,
 armnn::TensorInfo>& input,
 const void* inputTensorData)
{
 return { { input.first, armnn::ConstTensor(input.second, inputTensorData) } };
}
 
// Helper function to make output tensors
armnn::OutputTensors MakeOutputTensors(const std::pair<armnn::LayerBindingId,
 armnn::TensorInfo>& output,
 void* outputTensorData)
{
 return { { output.first, armnn::Tensor(output.second, outputTensorData) } };
}
 
extern "C" __attribute__ ((visibility ("default")))
int detectObjects(float *inputPtr, float *result)
{
 float *outputPtr = s_OutputBuffer;
 armnn::Status ret = s_Runtime->EnqueueWorkload(s_NetworkIdentifier,
 MakeInputTensors(s_InputBindingInfo, inputPtr),
 MakeOutputTensors(s_OutputBindingInfo, outputPtr));
 if (ret == armnn::Status::Failure)
 {
 throw armnn::Exception("IRuntime::EnqueueWorkload failed");
 }

你可能需要自己去实现 ParseOutputTensorsYoloV1 方法来解释输出结果。这里有一些有用的代码片段帮助你实现那个 YOLO v1 解释器。

使用 NDK 独立工具链去编译以上代码，并生成 "libyoloDetector.so" 共享库。为了可以在 Unity 中被调用到，你应该创建一个名为 "Assets > Plugins > Android" 的目录，并且拷贝那些 armeabi-v7a 共享库到这个 Unity 项目的目录中。这里是我所拷贝的共享库：

libarmnn.so
libarmnnCaffeParser.so
libprotobuf.so
libc++_shared.so
libyoloDetector.so

整合 Arm NN 到 Unity 中

让我们回到之前创建的 Unity 项目当中。我们需要拿到摄像头数据，以作为物体检测模型的输入数据。但 Google ARCore 已经完全控制了摄像头。幸运的是，Google ARCore 已经意识到其他程序也可能需要访问摄像头数据。因此，他们提供了一个示例程序。你可以查看 "Assets > GoogleARCore > Examples > ComputerVision" 示例代码以获得更多信息。我们可以在演示程序中使用示例中的 TextureReader 脚本去做同样的事情。在 Hierarchy 选项卡中选中 Main 场景。

选择 "GameObject > Create Empty" 去创建一个空的 Game Object。
把那个空的 Game Object 重命名为 "ArmNNCaffeParserController"。
单击 Inspector 里的 "Add Component" 按钮。
搜索 "Texture Reader" 并添加到 "ArmNNCaffeParserController" 里。
把 Image Width 和 Image Height 改为 448。
把 "Image Sample Mode" 设定为 "Keep Aspect Ratio"
把 "Image Format" 设定为 "Image Format Color"

最后你将得到这个状态：

为本演示程序创建两个 C Sharp 脚本。

创建 "Assets > Scripts" 目录。
创建两个名为 "ArmNNCaffeParserController.cs" 和 "ArmNNCaffeDetector.cs" 的 C Sharp 脚本。
在 "Main" 场景中选择 "ArmNNCaffeParserController"。
点击 "Add Component"，搜索并添加 "Arm NN Caffe Parser Controller"。

在 "ArmNNCaffeDetector.cs" 脚本里，在构造函数中调用 initDetector native 函数。

private static int INPUT_SIZE = 448;
    private static int RESULT_SIZE = 7 * 7 * 6;
    private float[] fetchResults = new float[RESULT_SIZE];

    [DllImport ("yoloDetector")]
    private static extern void initDetector();

    public ArmNNCaffeDetector()
    {
        inPtr = Marshal.AllocHGlobal(3 * INPUT_SIZE * INPUT_SIZE * sizeof(float));
        outPtr = Marshal.AllocHGlobal(RESULT_SIZE * sizeof(float));
        initDetector();
    }

并且在新创建的 DetectAsync 方法中调用 detectObjects native 函数。

[DllImport ("yoloDetector")]
    private static extern int detectObjects(IntPtr input, IntPtr output);

    public Task<List<KeyValuePair<DetectResults, float>>> DetectAsync(byte[] camImage)
    {
        return Task.Run(() =>
        {
            // Prepare input here
            ...
            Marshal.Copy(inputBuffer, 0, inPtr, inputBuffer.Length);
            int detectObjectNums = detectObjects(inPtr, outPtr);
            Marshal.Copy(outPtr, fetchResults, 0, RESULT_SIZE);
            ...
            // Parse and return the results here
        });
    }
在调用 detectObjects 之前，你可能需要把摄像头的数据格式转换成

在调用 detectObjects 之前，你可能需要把摄像头的数据格式转换成 Arm NN 所需的顺序。以下是相关代码片段：

    float[] inputBuffer = new float[INPUT_SIZE * INPUT_SIZE * 3];
    int h = INPUT_SIZE;
    int w = INPUT_SIZE;
    int c = 4;
    for (int j = 0; j < h; ++j)
    {
        for (int i = 0; i < w; ++i)
        {
            int r, g, b;
            r =  camImage[j * w * c + i * c + 0];
            g =  camImage[j * w * c + i * c + 1];
            b =  camImage[j * w * c + i * c + 2];

            // Arm NN order: C, H, W
            int rDstIndex = 0 * h * w + j * w + i;
            int gDstIndex = 1 * h * w + j * w + i;
            int bDstIndex = 2 * h * w + j * w + i;

            inputBuffer[rDstIndex] = (float)r/255.0f;
            inputBuffer[gDstIndex] = (float)g/255.0f;
            inputBuffer[bDstIndex] = (float)b/255.0f;
        }
    }

在 "ArmNNCaffeParserController.cs" 脚本里，实例化 ArmNNCaffeDetector 类并为 TextureReader 设置 callback 函数。

    TextureReader TextureReaderComponent;
    private ArmNNCaffeDetector detector;

    private int m_ImageWidth = 0;
    private int m_ImageHeight = 0;
    private byte[] m_CamImage = null;

    private bool m_IsDetecting = false;

    void Start () {
        this.detector = new ArmNNCaffeDetector();

        TextureReaderComponent = GetComponent<TextureReader> ();

        // Registers the TextureReader callback.
        TextureReaderComponent.OnImageAvailableCallback += OnImageAvailable;
        Screen.sleepTimeout = SleepTimeout.NeverSleep;
    }

实现 OnImageAvailable 方法以获得摄像头数据，并调用 ArmNNDetect 方法。

    public void OnImageAvailable(TextureReaderApi.ImageFormatType format, int width, int height, IntPtr pixelBuffer, int bufferSize)
    {
        if (format != TextureReaderApi.ImageFormatType.ImageFormatColor)
        {
            Debug.Log("No object detected due to incorrect image format.");
            return;
        }

        if (m_IsDetecting) {
            return;
        }

        if (m_CamImage == null || m_ImageWidth != width || m_ImageHeight != height)
        {
            m_CamImage = new byte[width * height * 4];
            m_ImageWidth = width;
            m_ImageHeight = height;
        }
        System.Runtime.InteropServices.Marshal.Copy(pixelBuffer, m_CamImage, 0, bufferSize);

        m_IsDetecting = true;
        Invoke(nameof(ArmNNDetect), 0f);
    }

在 ArmNNDetect 中调用 DetectAsync 方法。

    private async void ArmNNDetect()
    {
        var probabilities_and_bouding_boxes = await this.detector.DetectAsync (m_CamImage);
        ...
        // Visualize the bounding boxes and probabilities to the screen
        // Use "Frame.Raycast" which ARCore provided to find the 3D pose of the detected objects.
        // And render a related virtual object at the pose.
    }

其中 DetectAsync 方法将返回检测到的物体的概率数据以及边界框。在那之后，你就可以做你想要做的任何事情了。比如，可视化边界框，并且把虚拟内容摆放到检测到的真实物体旁边。

需要如何使用 "Frame.Raycast" 方法来得知被检测到物体的 3D 位置？还记得你在 "HelloARController.cs" 脚本里的 Update 方法中注释掉的代码吗？你可以参考那份代码，并且使用边界框的 2D 坐标代替触摸事件的坐标。

    // Raycast against the location the object detected to search for planes.
    TrackableHit hit;
    TrackableHitFlags raycastFilter = TrackableHitFlags.PlaneWithinPolygon |
        TrackableHitFlags.FeaturePointWithSurfaceNormal;

    if (Frame.Raycast(boundingbox.position.x, boundingbox.position.y, raycastFilter, out hit))
    {
        var andyObject = Instantiate(AndyAndroidPrefab, hit.Pose.position, hit.Pose.rotation);

        // Create an anchor to allow ARCore to track the hitpoint as understanding of the physical
        // world evolves.
        var anchor = hit.Trackable.CreateAnchor(hit.Pose);

        // Andy should look at the camera but still be flush with the plane.
        if ((hit.Flags & TrackableHitFlags.PlaneWithinPolygon) != TrackableHitFlags.None)
        {
            // Get the camera position and match the y-component with the hit position.
            Vector3 cameraPositionSameY = FirstPersonCamera.transform.position;
            cameraPositionSameY.y = hit.Pose.position.y;

            // Have Andy look toward the camera respecting his "up" perspective, which may be from ceiling.
            andyObject.transform.LookAt(cameraPositionSameY, andyObject.transform.up);
        }

        // Make Andy model a child of the anchor.
        andyObject.transform.parent = anchor.transform;
    }
d

最后，你就可以完成你自己的 AR + ML 演示程序了。

这段视频可以到Arm的优酷频道进行观赏和下载

https://v.youku.com/v_show/id_XMzcyMDc3NDg3Mg==.html

yolon3000@163.com over 5 years ago
Song Bin 宋斌我想需要编译正确的下面几个库，可以给我发一下吗？邮箱与账号一致！谢谢！
- libarmnn.so
- libarmnnCaffeParser.so
- libprotobuf.so
- libc++_shared.so
- libyoloDetector.so
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

中文社区博客

Arm A-Profile构架2022扩展

Zenon Xiu (修志龙）

原文：Arm A-Profile Architecture Developments 2022 - Architectures and Processors blog - Arm Community blogs - Arm Community 作者：Martin Weidmann翻译：修志龙（Zenon Xiu) 与arm构架授权和生态伙伴一起，arm持续演进其构架，开发新功能以满足现有和新市场的要求…
- October 17, 2022
深入理解 Arm A-profile的non-maskable interrupt -NMI

Zenon Xiu (修志龙）

原文： https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/a-profile-non-maskable-interrupts 翻译：修志龙 Zenon Xiu Arm A-profile构架一个长久以来的局限性是：缺乏对non-maskable interrupt (NMI…
- August 24, 2022
Arm A-Profile 构架2021扩展

Zenon Xiu (修志龙）

原文： https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2021 Martin Weidmann September 8, 2021 翻译注释：Zenon Xiu Arm与arm构架授权公司及生态伙伴一起…
- August 17, 2022

ARM中国大学计划博客

Arm新闻

中文mbed博客

中文社区博客

恩智浦汽车电子MCU讨论区博客

当增强现实(AR)遇到机器学习(ML)，一个结合 Google ARCore 与 Arm NN 的 Unity 案例

背景介绍

在 Unity 中打开 Google ARCore 功能

编译 Arm NN 共享库

作为 Unity 的 Native 插件的一个 C++ 物体检测器

整合 Arm NN 到 Unity 中

Arm A-Profile构架2022扩展

深入理解 Arm A-profile的non-maskable interrupt -NMI

Arm A-Profile 构架2021扩展