A new approach to software is needed to unleash the full power of multicore processing

October 1, 2020

4 minute read time.

When combined with a distributed-microkernel OS, today’s highly efficient multicore processors can endow autonomous systems with human-like skills, explains Rolland Dudemaine, VP Engineering, eSOL Europe.

Today’s autonomous machines and advanced industrial automation are increasingly expected to perform functions that humans are particularly good at, such as object recognition, contextual awareness, and decision making. Many are classified as edge devices that are independently able to capture, filter, grade, and use or discard vast quantities of information. Minimizing data exchanged with the cloud helps reduce bandwidth costs and protects time-critical processes against the adverse effects of latency.

Deterministic real-time performance is usually needed in applications such as autonomous-driving systems, industrial robots, and precision industrial automation, to ensure the reliability of control loops. On the other hand, power constraints are usually tight, often because the system is battery-powered, or a large, bulky power supply is undesirable from a size or weight standpoint (SWaP). Some applications, such as automotive systems, drones, or mobile robots, are subject to all of these considerations.

Although we are looking for these systems to replicate human capabilities, the human brain itself sets a high bar in terms of energy efficiency. Its enormous range of skills and real-time multi-tasking abilities represent processing performance equivalent to somewhere between 10Tera-FLOPS (10^12) and 10Yotta FLOPS (10^25), while consuming only about 20 Watts.

Conventional CPUs, DSPs, GPUs, and multicore including hybrid processors have not come close to the brain’s combination of high performance and low power. A leading GPU such as the Nvidia K80 (GK210), for example, tops out at about 1.87TFLOPS although power consumption is 300W. But a GPU is only capable of running specific, dedicated algorithms, and it cannot combine compute and general-purpose software.

Massively Parallel Single-Chip

The best we have today is the emerging class of distributed, heterogeneous multi-/ many-core processors. These contain large numbers of independent cores with closely coupled memory, interconnected through a high-speed Network On Chip (NOC) infrastructure. These types of devices are becoming mainstream for specialized datacenter systems.

In addition to high-performance and power-efficiency, safety and security are key requirements of industrial and automotive edge applications. Freedom From Interference (FFI) between different modules in the system is an important principle in Functional Safety (FuSa) design. Spatial and temporal isolation is another important FuSa principle, as it prevents modules affecting each other and makes sure abnormal programs cannot sap the performance of others.

Go Multikernel to Unleash Multicore

To maximize the performance of the emerging heterogeneous multi-/many-core processors while ensuring safety and security through isolation and protection (FFI), a new approach to software is needed. This particularly applies to the architecture of the operating system (OS) that brings together the various computing elements.

A multikernel architecture, which can also be described as an arrangement of “distributed microkernels” – as distinct from a microkernel OS – brings these goals within reach. It provides a platform for a system of systems, containing a network of single-core, message-based kernels, as shown on Figure 1. Lightweight message-passing allows fast, deterministic communication at the OS level.

eMCOS “Multikernel” design maximizes Heterogeneous Multicore architectures.

Figure 1. eMCOS “Multikernel” design maximizes Heterogeneous Multicore architectures.

A key property of this architecture is the fact that no kernel instance on any given core can block a kernel instance on another core. This simultaneously ensures much better parallelism, concurrency, and determinism, both at the kernel and the application level.

eSOL’s eMCOS is a distributed-microkernel OS that now enables an independent microkernel to run on each core of the multicore processor while providing a unified platform for high-speed message passing and other functions. Figure 2 illustrates the underlying structure and the ability to handle hard and soft real-time workloads.

eMCOS enables a single, independent microkernel to run on each core.

Figure 2. eMCOS enables a single, independent microkernel to run on each core.

Moreover, eMCOS POSIX is a multi-process RTOS that provides extended POSIX support and permits multi-cluster organization on one or more separate processors or SoCs. Threads from the same process may run on any combination of cores within a cluster, meanwhile communicating with other threads locally or remotely across clusters.

One advantage of this approach is that this makes the kernel very portable by design. It is therefore capable of serving small core types with SoCs based on ARM Cortex^®-R or even Cortex-M, but can adapt up to much bigger systems like ARM Cortex-A SoCs.

Bringing together eMCOS POSIX and high performance ARM cores processor can allow taking advantage of the full power of the platform. Interestingly, the benefits are not only visible when the core count is high, but also in existing systems. The property of independent scheduling is to give a general runtime performance, as well as a more accurate single-core determinism in a multicore system.

eMCOS Multikernel architecture enables true parallelism with minimum task/thread switch time.

Figure 3. eMCOS Multikernel architecture enables true parallelism with minimum task/thread switch time.

The incremental scalability of this distributed-microkernel OS, eMCOS, gives developers the flexibility to innovate with a fully optimized new architecture while also leveraging the best-performing aspects of their existing platforms. For more information about our presentation, join us at Arm DevSummit in our technical session: Getting the Best Performance of Today's and Tomorrow's Arm Cores with eSOL.

0 comments
0 members are here

Servers and Cloud Computing blog

Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

Na Li

This blog explores the performance benefits of RAG and provides pointers for building a RAG application on Arm®︎ Neoverse-based Google Axion Processors for optimized AI workloads.
- April 7, 2025
Arm CMN S3: Driving CXL storage innovation

John Xavier Lionel

CXL are revolutionizing the storage landscape. Neoverse CMN S3 plays a pivotal role in enabling high-performance, scalable storage devices configured as CXL Type 1 and Type 3.
- February 24, 2025
Streamline Arm adoption with GitHub Copilot and Arm64 Runners

Michael Gamble

The Arm for GitHub Copilot extension is here to change the way developers approach architecture migration.
- February 19, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

A new approach to software is needed to unleash the full power of multicore processing

Massively Parallel Single-Chip

Go Multikernel to Unleash Multicore

Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

Arm CMN S3: Driving CXL storage innovation

Streamline Arm adoption with GitHub Copilot and Arm64 Runners