When developing vision applications, the most common knowledge gap we encounter is a lack of understanding; regarding the performance required and what can be achieved with a given hardware architecture. The confusion partly stems from dissimilar benchmarks used to measure the performance of GPUs and AI accelerators. This gets compounded by a rapidly evolving software ecosystems of networks, tools and frameworks. To truly determine performance, considerable experimentation is required.
Instead, let’s take a reductionist approach. Let’s first start with the CPU core and understand what can be achieved using this unit of processing. If we can understand the type of vision detection pipeline that can be created using an Arm v8 architecture, then we can apply this knowledge to a broad swath of existing hardware. And in the process, define the point at which the cost and complexity of specialized acceleration is justified. Fortunately, a lot of work has already been done to optimize vision algorithm primitivities and inference to maximize performance using the Arm v8 architecture. Specifically, by taking advantage of the NEON (DSP instruction) and a Floating-Point Unit (FPU).
Similarly, in the area of deep learning significant advances have developed network models such as MobileNet_v2 SSD. This new generation of detectors are significantly more efficient than predecessors, while retaining a similar level of accuracy.
This improvement in performance makes it possible to apply these state-of-art models to the Arm architecture and optimize them further by using quantization and ArmNN. The efficiency from quantization is a result of converting the model inputs and weights from float32 to uint8. To do this we use TensorFlow Lite to create the quantized version of the MobileNet_v2 SSD model. ArmNN provides parsers to read Tensorflow Lite flatbuffer models, optimize them and execute them on available compute devices. Since we're using CpuAcc (Arm v8 CPU with NEON) matrix and vector math is supported by the underlying ArmCompute library with NEON SIMD instructions.
To illustrate the efficiency of ArmNN, the table below compares ArmNN to other common inference methods such as OpenCV, which does not support quantized models.
Figure 1: Performance comparison of ArmNN and other common inference methods such as OpenCV
Methodology summary for each configuration:
By combining inference with algorithmic processing, such as tracking. We have the fundamentals of a vision system capable of detecting basic safety and security events such as; intrusion, zone incursion and boundary crossing. These capabilities have broad application to many vision tasks. When they are applied together with higher level logic, they form a vision pipeline. Consider a simple use case of detecting the theft of a package from outside your house.
Figure 2: Video of package theft using vision pipeline
In the example video we use ArmNN to detect and classify people and packages and use an algorithmic pipeline to track the important objects in the field of view. This allows us to answer the following questions:
To aid our higher level logic, we add an incursion line to form a boundary. A person crossing this boundary the wrong way creates an intrusion event. An object crossing the boundary the wrong way creates a removed object event. Together, this event sequence provides adequate situational awareness to determine (with high probability) that your package won’t be there when you get home.
Arcturus specializes in developing vision pipelines and can make use of additional functions such as; background subtraction, optical flow and specialized neural networks for re-identification or segmentation. This capability is supported with comprehensive video pre-processing, post-processing, streaming and storage subsystems.These are combined with IoT-like event notifications and a UI/UX. This makes it possible to create powerful edge-based vision analytics systems that eliminate the need to continuously stream pixel data for external processing. This results in improved use of the data network, reduces privacy concerns and creates a premises-based system from which local actions can take place. Arcturus develops full-stack solutions for smart city and smart building applications. You can check out more of our work including how we are helping to bring intelligence to public transportation networks. Watch the demo of Arcturus here.Big thanks goes out to David Steele, Director of Innovation for Arcturus Networks www.arcturusnetworks.com, who provided the content for this blog.
[CTAToken URL = "https://www.arm.com/products/silicon-ip-cpu/machine-learning/arm-nn" target="_blank" text="Get started with ArmNN" class ="green"]