Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Smart Homes
    • Tools, Software and IDEs blog
    • Works on Arm blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
High Performance Computing (HPC) blog A new approach to software is needed to unleash the full power of multicore processing
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • High Performance Computing (HPC)
  • Real Time Operating Systems (RTOS)
  • automotive
  • Edge Computing
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

A new approach to software is needed to unleash the full power of multicore processing

Rolland Dudemaine
Rolland Dudemaine
October 1, 2020
4 minute read time.

When combined with a distributed-microkernel OS, today’s highly efficient multicore processors can endow autonomous systems with human-like skills, explains Rolland Dudemaine, VP Engineering, eSOL Europe.

Today’s autonomous machines and advanced industrial automation are increasingly expected to perform functions that humans are particularly good at, such as object recognition, contextual awareness, and decision making. Many are classified as edge devices that are independently able to capture, filter, grade, and use or discard vast quantities of information. Minimizing data exchanged with the cloud helps reduce bandwidth costs and protects time-critical processes against the adverse effects of latency.

Deterministic real-time performance is usually needed in applications such as autonomous-driving systems, industrial robots, and precision industrial automation, to ensure the reliability of control loops. On the other hand, power constraints are usually tight, often because the system is battery-powered, or a large, bulky power supply is undesirable from a size or weight standpoint (SWaP). Some applications, such as automotive systems, drones, or mobile robots, are subject to all of these considerations.

Although we are looking for these systems to replicate human capabilities, the human brain itself sets a high bar in terms of energy efficiency. Its enormous range of skills and real-time multi-tasking abilities represent processing performance equivalent to somewhere between 10Tera-FLOPS (10^12) and 10Yotta FLOPS (10^25), while consuming only about 20 Watts.

Conventional CPUs, DSPs, GPUs, and multicore including hybrid processors have not come close to the brain’s combination of high performance and low power. A leading GPU such as the Nvidia K80 (GK210), for example, tops out at about 1.87TFLOPS although power consumption is 300W. But a GPU is only capable of running specific, dedicated algorithms, and it cannot combine compute and general-purpose software.

Massively Parallel Single-Chip

The best we have today is the emerging class of distributed, heterogeneous multi-/ many-core processors. These contain large numbers of independent cores with closely coupled memory, interconnected through a high-speed Network On Chip (NOC) infrastructure. These types of devices are becoming mainstream for specialized datacenter systems.

In addition to high-performance and power-efficiency, safety and security are key requirements of industrial and automotive edge applications. Freedom From Interference (FFI) between different modules in the system is an important principle in Functional Safety (FuSa) design. Spatial and temporal isolation is another important FuSa principle, as it prevents modules affecting each other and makes sure abnormal programs cannot sap the performance of others.

Go Multikernel to Unleash Multicore

To maximize the performance of the emerging heterogeneous multi-/many-core processors while ensuring safety and security through isolation and protection (FFI), a new approach to software is needed. This particularly applies to the architecture of the operating system (OS) that brings together the various computing elements.

A multikernel architecture, which can also be described as an arrangement of “distributed microkernels” – as distinct from a microkernel OS – brings these goals within reach. It provides a platform for a system of systems, containing a network of single-core, message-based kernels, as shown on Figure 1. Lightweight message-passing allows fast, deterministic communication at the OS level.

eMCOS “Multikernel” design maximizes Heterogeneous Multicore architectures.

Figure 1. eMCOS “Multikernel” design maximizes Heterogeneous Multicore architectures.

A key property of this architecture is the fact that no kernel instance on any given core can block a kernel instance on another core. This simultaneously ensures much better parallelism, concurrency, and determinism, both at the kernel and the application level.

eSOL’s eMCOS is a distributed-microkernel OS that now enables an independent microkernel to run on each core of the multicore processor while providing a unified platform for high-speed message passing and other functions. Figure 2 illustrates the underlying structure and the ability to handle hard and soft real-time workloads.


eMCOS enables a single, independent microkernel to run on each core.

Figure 2. eMCOS enables a single, independent microkernel to run on each core.

Moreover, eMCOS POSIX is a multi-process RTOS that provides extended POSIX support and permits multi-cluster organization on one or more separate processors or SoCs. Threads from the same process may run on any combination of cores within a cluster, meanwhile communicating with other threads locally or remotely across clusters.

One advantage of this approach is that this makes the kernel very portable by design. It is therefore capable of serving small core types with SoCs based on ARM Cortex®-R or even Cortex-M, but can adapt up to much bigger systems like ARM Cortex-A SoCs.

Bringing together eMCOS POSIX and high performance ARM cores processor can allow taking advantage of the full power of the platform. Interestingly, the benefits are not only visible when the core count is high, but also in existing systems. The property of independent scheduling is to give a general runtime performance, as well as a more accurate single-core determinism in a multicore system.


eMCOS Multikernel architecture enables true parallelism with minimum task/thread switch time.

Figure 3. eMCOS Multikernel architecture enables true parallelism with minimum task/thread switch time.

The incremental scalability of this distributed-microkernel OS, eMCOS, gives developers the flexibility to innovate with a fully optimized new architecture while also leveraging the best-performing aspects of their existing platforms. For more information about our presentation, join us at Arm DevSummit in our technical session: Getting the Best Performance of Today's and Tomorrow's Arm Cores with eSOL.

Anonymous
High Performance Computing (HPC) blog
  • AWS Graviton3 improves Cadence EDA tools performance for Arm

    Tim Thornton
    Tim Thornton
    In this blog we provide an update to our use of Cadence EDA tools in the AWS cloud, with a focus on Graviton3 performance improvements.
    • November 16, 2022
  • A case study in vectorizing HACCmk using SVE

    Brian Waldecker
    Brian Waldecker
    This blog uses the HACCmk benchmark to demonstrate the vectorization capabilities and benefits of SVE over NEON (ASIMD)
    • November 3, 2022
  • Bringing WRF up to speed with Arm Neoverse

    Phil Ridley
    Phil Ridley
    In this blog we examine the WRF weather model and examine the performance improvement available using AWS Graviton3 (Neoverse V1 core) compared to AWS Graviton2 (Neoverse N1 core).
    • October 19, 2022