Machine Learning moving to the Network edge to improve next generation services

January 24, 2018

3 minute read time.

Artificial intelligence (AI) is a technology that is already impacting our lives, from recognizing who’s in our photos on social media to spotting patterns in medical data, defining the evolution of self-driving cars and running real-time fraud detection in our networks.

In its broadest sense, AI enables compute systems to mimic human intelligence and uses machine learning (ML) via neural networks to predict outcomes with a high degree of accuracy.

Advances in compute processing and AI algorithms have enabled applications, live training and inference to move to the edge of the network, resulting in improved responsiveness, reliability and security, and enabling systems to better react to the ever-changing customer requirements.

Self-driving cars, for example, will need AI-enabled edge compute to react to changes in the immediate environment without the inherent delays of sending data back and forth to the cloud.

Google, amongst others, have also been exploring a concept called “federated learning” that uses edge devices to collaboratively learn a shared prediction model, removing the need to store user’s training data in the cloud.

This improves security and should allow market segments with strict data privacy regulations like healthcare or banking to finally take full advantage of ML.

ML moving to the edge

While next-generation technologies are being designed with machine learning in mind, today’s CPU and GPUs are already running ML algorithms across many segments. To cover the wide range of use cases and platforms, machine learning solutions need combinations of ML processing running on CPUs + SIMD to general purpose GPU and dedicated hardware acceleration.

ML running on CPUs to general purpose GPUs and dedicated hardware acceleration

Even though hardware acceleration will normally be more efficient and performant; some specific use cases are more suited to remain on the CPU as the overhead of moving data out to an accelerator and retrieving the result adds prohibitive latency.

Additionally, Machine Learning on CPUs offer advantages with:

ease of portability and use-case flexibility
market availability at different performance and price points
deployment across a wide spectrum of devices from edge to edge compute and cloud servers

To ease deployment, Arm has created a Compute Library, which includes a comprehensive set of functions for ML frameworks like Google’s TensorFlow, as well as imaging and vision projects. The purpose of the library is to provide portable code that can run across various Arm system configurations.

These ML systems use neural networks, of which many different types exist. They learn tasks by considering examples and often don’t require very high accuracy data, meaning that math calculations can usually be performed on 16-bit or even 8-bit data, rather than large 32 or 64-bit entries.

Knowing that the majority of neural network processing uses 8-bit fixed-point matrix multiplication, the Armv8.2-A architecture introduced support for half-precision (FP16) and integer dot products (INT8) floating point SIMD (single instruction multiple data) NEON instructions to accelerate ML NN processing.

ML and SGEMM

Machine Learning and SGEMM

ML algorithms typically use single precision floating General Matrix Multiply (SGEMM), and the bar chart above shows that by moving from FP32 to FP16 on an Armv8.2 core gives a 2x improvement in performance. As we move from 16 to 8 bit we get a further 2x improvement.

This saves on memory and cache requirements, and greatly improves performance and memory bandwidth, in power constrained edge compute devices.

Looking at it from a system point of view, Arm has also coupled its agile cache interconnect (CoreLink CMN-600) IP closer to the core, resulting in considerably reduced data-access latencies, which in turn leads to significantly faster application response time in demanding computational tasks like Machine Learning.

Conclusion

AI is still in the early stages of adoption, but it is clear that ML will move to the edge of the network to enable these new use cases.

Arm technology is enabling this new era; the Arm Infrastructure ecosystem, Arm ML developer community and Arm architecture are driving innovation and choice across the network and providing the foundation for your next generation edge compute AI application.

We will talk more about ML acceleration in our next blog.

0 comments
0 members are here

Architectures and Processors blog

Introducing GICv5: Scalable and secure interrupt management for Arm

Christoffer Dall

Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
- April 28, 2025
Getting started with AARCHMRS Features.json using Python

Joh

A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
- April 8, 2025
Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Machine Learning moving to the Network edge to improve next generation services

Conclusion

Introducing GICv5: Scalable and secure interrupt management for Arm

Getting started with AARCHMRS Features.json using Python

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC