Using Arm v8 for Vision at the Edge

October 31, 2019

4 minute read time.

When developing vision applications, the most common knowledge gap we encounter is a lack of understanding; regarding the performance required and what can be achieved with a given hardware architecture. The confusion partly stems from dissimilar benchmarks used to measure the performance of GPUs and AI accelerators. This gets compounded by a rapidly evolving software ecosystems of networks, tools and frameworks. To truly determine performance, considerable experimentation is required.

Instead, let’s take a reductionist approach. Let’s first start with the CPU core and understand what can be achieved using this unit of processing. If we can understand the type of vision detection pipeline that can be created using an Arm v8 architecture, then we can apply this knowledge to a broad swath of existing hardware. And in the process, define the point at which the cost and complexity of specialized acceleration is justified.

Fortunately, a lot of work has already been done to optimize vision algorithm primitivities and inference to maximize performance using the Arm v8 architecture. Specifically, by taking advantage of the NEON (DSP instruction) and a Floating-Point Unit (FPU).

Similarly, in the area of deep learning significant advances have developed network models such as MobileNet_v2 SSD. This new generation of detectors are significantly more efficient than predecessors, while retaining a similar level of accuracy.

This improvement in performance makes it possible to apply these state-of-art models to the Arm architecture and optimize them further by using quantization and ArmNN. The efficiency from quantization is a result of converting the model inputs and weights from float32 to uint8. To do this we use TensorFlow Lite to create the quantized version of the MobileNet_v2 SSD model. ArmNN provides parsers to read Tensorflow Lite flatbuffer models, optimize them and execute them on available compute devices. Since we're using CpuAcc (Arm v8 CPU with NEON) matrix and vector math is supported by the underlying ArmCompute library with NEON SIMD instructions.

To illustrate the efficiency of ArmNN, the table below compares ArmNN to other common inference methods such as OpenCV, which does not support quantized models.

A graph to show inference performance. Figure 1: Performance comparison of ArmNN and other common inference methods such as OpenCV

Methodology summary for each configuration:

6 images, each resized to 300 x 300
OpenCV and ArmNN, both using 4 threads
ArmNN is using CpuAcc (i.e. with NEON acceleration)
Model is ssd_mobilenet_v2; OpenCV loads Tensorflow .pb/.pbtxt while ArmNN uses .tflite (for both quantized and non-quantized)
Model is pretrained on MS-COCO taken directly from Tensorflow Model Zoo
Tests run using NXP i.MX 8M Mini (4 x Arm Cortex-A53 @ 1.8GHz)

By combining inference with algorithmic processing, such as tracking. We have the fundamentals of a vision system capable of detecting basic safety and security events such as; intrusion, zone incursion and boundary crossing. These capabilities have broad application to many vision tasks. When they are applied together with higher level logic, they form a vision pipeline. Consider a simple use case of detecting the theft of a package from outside your house.

A graphic to show the ArmNN inference. Figure 2: Video of package theft using vision pipeline

In the example video we use ArmNN to detect and classify people and packages and use an algorithmic pipeline to track the important objects in the field of view. This allows us to answer the following questions:

Is there a package?
Is there a person?
What direction is the person moving?
What direction is the package moving?

To aid our higher level logic, we add an incursion line to form a boundary. A person crossing this boundary the wrong way creates an intrusion event. An object crossing the boundary the wrong way creates a removed object event. Together, this event sequence provides adequate situational awareness to determine (with high probability) that your package won’t be there when you get home.

Arcturus specializes in developing vision pipelines and can make use of additional functions such as; background subtraction, optical flow and specialized neural networks for re-identification or segmentation. This capability is supported with comprehensive video pre-processing, post-processing, streaming and storage subsystems.These are combined with IoT-like event notifications and a UI/UX.

This makes it possible to create powerful edge-based vision analytics systems that eliminate the need to continuously stream pixel data for external processing. This results in improved use of the data network, reduces privacy concerns and creates a premises-based system from which local actions can take place.

Arcturus develops full-stack solutions for smart city and smart building applications. You can check out more of our work including how we are helping to bring intelligence to public transportation networks.

Watch the demo of Arcturus here.

Big thanks goes out to David Steele, Director of Innovation for Arcturus Networks www.arcturusnetworks.com, who provided the content for this blog.

Get started with ArmNN

AI blog

Advancing PyTorch Performance on Arm: Key Enhancements in the 2.9 Release

Ashok Bhat

As part of the new PyTorch 2.9 release, Arm contributed key enhancements to ensure seamless performance and stability on Arm platforms. Learn more about the enhancements in this blog post.
- October 15, 2025
Are you attending PyTorch Conference 2025?

Michelle Yung

Join us on site at the PyTorch Conference 2025 on October 22-23 to learn how Arm empowers developers to build and deploy AI applications easily using PyTorch and ExecuTorch.
- October 15, 2025
Unlocking AI Potential with Kleidi: Seamless Acceleration Workshop Recap

Parichay Das

Explore takeaways from our Kleidi AI workshop led by Arm Ambassador Parichay Das, where participants tackled performance gaps and future AI needs.
- September 25, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Using Arm v8 for Vision at the Edge

Advancing PyTorch Performance on Arm: Key Enhancements in the 2.9 Release

Are you attending PyTorch Conference 2025?

Unlocking AI Potential with Kleidi: Seamless Acceleration Workshop Recap