Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Arm Research
    • DesignStart
    • Education Hub
    • Graphics and Gaming
    • High Performance Computing
    • Innovation
    • Multimedia
    • Open Source Software and Platforms
    • Physical
    • Processors
    • Security
    • System
    • Software Tools
    • TrustZone for Armv8-M
    • 中文社区
  • Blog
    • Announcements
    • Artificial Intelligence
    • Automotive
    • Healthcare
    • HPC
    • Infrastructure
    • Innovation
    • Internet of Things
    • Machine Learning
    • Mobile
    • Smart Homes
    • Wearables
  • Forums
    • All developer forums
    • IP Product forums
    • Tool & Software forums
  • Support
    • Open a support case
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Internet of Things
Internet of Things
Internet of Things Rise of Smart Accelerators to Service 1 Trillion Devices (part 1)
  • Blog
  • Videos & Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
  • New
Internet of Things requires membership for participation - click to join
More blogs in Internet of Things
  • Internet of Things

  • Wearables blog

Tags
  • Augmented Reality (AR)
  • automotive
  • Heterogeneous Computing
  • Network Function Virtualization (NFV)
  • Virtual Reality (VR)
  • Edge Computing
  • infrastructure
  • Internet of Things (IoT)
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Rise of Smart Accelerators to Service 1 Trillion Devices (part 1)

Jeff Defilippi
Jeff Defilippi
May 24, 2018

This is a two-part blog on smart acceleration for distributed cloud computing. Part 1 will focus on the technical and commercial drivers along with an overview of the solutions available today. Part 2 will focus on future trends and will take a closer look at building smart acceleration systems.

Introduction

The increasing number of connected devices and the data generated from them is providing an opportunity for new applications and business models, which in turn is driving a new set of infrastructure solution requirements from the edge back to the datacenter.

Figure 1 highlights the transition to distributed edge computing with IoT devices that have dramatically different requirements from real time latency for industrial/automotive use-cases to high bandwidth for AR/VR. This shift forces more compute to move to the data source.

Containers also promise to accelerate edge application deployment cycles and increase cloud agility in fundamental ways. To scale applications and move the distributed compute to the source requires the same deployment methodologies and container isolation for guest services that run in the cloud today.

 Fig. 1 Moving to a distributed compute model

Fig. 1 Moving to a distributed compute model

Acceleration enabling edge compute

Combining many different types of processors and accelerators is needed to maximize efficiency and meet the real-time demands at the network edge. In a heterogeneous system, the processors and the accelerators must share data, and moving it about can be a headache (especially in the virtualized / container world). Having the memory across these devices operate in a coherent manner – meaning that all devices can address all memory attached to those devices in a single, consistent way – is one of the holy grails of heterogeneous computing.

Traditional view of Host-Accelerator

 PCIe block data transfers (512B-4k) between Host and AcceleratorBut before discussing future technology solutions, it is useful to look back on the traditional Host-Accelerator systems and level set on the terminology. A traditional accelerator system includes;

  • Host Processor –multi-core processor that runs the OS, the virtual machines and application software stacks.
  • Accelerator – Fixed function device (including IO) that performs a specialized task. Network, storage, crypto engines are some examples.

The ubiquitous connectivity between Host and Accelerator is PCIe with block transfers typically in 512B to 4KB chunks.

Rise of Smart-Acceleration

 Network & Storage ProcessingThe increased use of virtualized guest OS and containers drove new virtualized networks and storage stacks, which initially were run on the Host Processor. As network speeds increased (10GbE, 25GbE, 100GbE) and NVM storage increased from thousands to millions of operations per second (IOPS), these software stacks were consuming 50 percent or more of the highly valuable host processing instead of being used for the actual application. Open vSwitch (OVS - a multi-layer, open source virtual switch), is an example stack that runs on the host processor.

To address this issue, Smart-Accelerators such as Smart-NICs and NVMe storage devices have been deployed. These devices combine efficient compute with accelerators and IO virtualization techniques (VirtIO, SR-IOV, etc) to reduce the host processing overhead to less than 5 percent.

Evolution to self-hosted acceleration

 Shared Virtual Memory

Smart acceleration did not stop evolving at off-load. Today, Smart-NICs have become self-hosted, which means that they have enough computing power and functionality to run full OS stacks with hypervisors, virtual machines and containers. Connecting other acceleration elements to the self-hosted NIC is also another growing trend. These accelerators include security off-load, distributed memory and ML to name a few. Finally, the traditional host is now completely freed up to run high value applications and services. 

A good overview of the path to self-hosted acceleration is covered by the following AWS Nitro introduction blog which describes the AWS EC2 instance evolution over time.

Arm smart acceleration solutions

The ability to develop heterogeneous solutions with right levels of compute and acceleration to balance performance and power efficiency has been a hallmark of the Arm architecture and partnership activity. An example is the cell phone in your pocket which has an integrated Arm processor balanced with GPU, video, and display to provide a rich user experience while maintaining energy efficiency to significantly extend battery life. Within the infrastructure space, the performance, scale and variety of devices is obviously very different from a cell phone, but the ultimate benefits of implementing designs using a heterogeneous architecture remain the same. 

Today, Arm smart acceleration technology is being deployed across the distributed computing infrastructure for a wide range of solutions, including cellular 4G/5G base band and radio, edge computing, NFV, core network, security appliance and datacenter network designs. Example product families include Broadcom NetXtreme, Cavium Octeon TX, Mellanox Bluefield, Marvell Armada, NXP LayerScape, SocioNext SyncQuacer and Xilinx UltraSoC.   

To be continued… stay tuned for part two where we will take a closer look at future architecture trends and dive into how these smart acceleration devices and systems are built. Some of the topics that will be covered include:

  • Emerging accelerated use cases impacting future system designs
  • Deployment of acceleration functions with shared virtual memory
  • Integrating accelerators with compute on-chip
  • Multichip connectivity with PCIe and CCIX
Anonymous
Internet of Things
  • Back to the roots: Tracing and debugging as a way to increase efficiency

    Anders Holmberg
    Anders Holmberg
    In this guest blog from Anders Holmberg, read about tracing and debugging to increase efficiency.
    • September 24, 2020
  • Design for the IoT Connected World with Next-Generation eMRAM NVM

    Phil Morris
    Phil Morris
    With digital transformation well underway across global industries, the need for more energy-efficient, secure, and scalable IoT connected world is essential. The adoption of embedded magnetoresistive…
    • September 17, 2020
  • Arm Cortex-R82: Combining high-performance 64-bit real-time and applications processing for the next generation of storage devices

    Neil Werdmuller
    Neil Werdmuller
    Find out in this blog how Arm Cortex-R82 with higher performance, real-time compute with more addressable space, and the ability to run Linux will enable the next generation of storage devices.
    • September 3, 2020