Artificial intelligence (AI) is a technology that is already impacting our lives, from recognizing who’s in our photos on social media to spotting patterns in medical data, defining the evolution of self-driving cars and running real-time fraud detection in our networks.
In its broadest sense, AI enables compute systems to mimic human intelligence and uses machine learning (ML) via neural networks to predict outcomes with a high degree of accuracy.
Advances in compute processing and AI algorithms have enabled applications, live training and inference to move to the edge of the network, resulting in improved responsiveness, reliability and security, and enabling systems to better react to the ever-changing customer requirements.
Self-driving cars, for example, will need AI-enabled edge compute to react to changes in the immediate environment without the inherent delays of sending data back and forth to the cloud.
Google, amongst others, have also been exploring a concept called “federated learning” that uses edge devices to collaboratively learn a shared prediction model, removing the need to store user’s training data in the cloud.
This improves security and should allow market segments with strict data privacy regulations like healthcare or banking to finally take full advantage of ML.
While next-generation technologies are being designed with machine learning in mind, today’s CPU and GPUs are already running ML algorithms across many segments. To cover the wide range of use cases and platforms, machine learning solutions need combinations of ML processing running on CPUs + SIMD to general purpose GPU and dedicated hardware acceleration.
Even though hardware acceleration will normally be more efficient and performant; some specific use cases are more suited to remain on the CPU as the overhead of moving data out to an accelerator and retrieving the result adds prohibitive latency.
Additionally, Machine Learning on CPUs offer advantages with:
To ease deployment, Arm has created a Compute Library, which includes a comprehensive set of functions for ML frameworks like Google’s TensorFlow, as well as imaging and vision projects. The purpose of the library is to provide portable code that can run across various Arm system configurations.
These ML systems use neural networks, of which many different types exist. They learn tasks by considering examples and often don’t require very high accuracy data, meaning that math calculations can usually be performed on 16-bit or even 8-bit data, rather than large 32 or 64-bit entries.
Knowing that the majority of neural network processing uses 8-bit fixed-point matrix multiplication, the Armv8.2-A architecture introduced support for half-precision (FP16) and integer dot products (INT8) floating point SIMD (single instruction multiple data) NEON instructions to accelerate ML NN processing.
Machine Learning and SGEMM
ML algorithms typically use single precision floating General Matrix Multiply (SGEMM), and the bar chart above shows that by moving from FP32 to FP16 on an Armv8.2 core gives a 2x improvement in performance. As we move from 16 to 8 bit we get a further 2x improvement.
This saves on memory and cache requirements, and greatly improves performance and memory bandwidth, in power constrained edge compute devices.
Looking at it from a system point of view, Arm has also coupled its agile cache interconnect (CoreLink CMN-600) IP closer to the core, resulting in considerably reduced data-access latencies, which in turn leads to significantly faster application response time in demanding computational tasks like Machine Learning.
AI is still in the early stages of adoption, but it is clear that ML will move to the edge of the network to enable these new use cases.
Arm technology is enabling this new era; the Arm Infrastructure ecosystem, Arm ML developer community and Arm architecture are driving innovation and choice across the network and providing the foundation for your next generation edge compute AI application.
We will talk more about ML acceleration in our next blog.