Computational Storage is bringing processing closer to the data

September 3, 2020

11 minute read time.

The amount of data being generated is exploding. The volume of data created, captured or replicated is expected to increase from 33 zettabytes¹ in 2018 to 175 zettabytes in 2025², according to analyst firm IDC. To realize value from this data, we must be able to process it into meaningful insights. It is becoming clear that compute centric architectures will not continue to scale, and the focus is now on generating insights from the vast volumes of data where it resides, in storage devices. This is driving the rapid development of data centric computational storage.

Today, a growing amount of data is being stored on drives, but almost always the storage and compute that processes the data are not in the same place. The approach of moving large amounts of data  (drives are commonly 16TB today and capacities are increasing),  between storage and compute cannot scale, and makes it difficult to extract insights from data that can be converted into added value and benefit service organizations.

In the traditional storage model, data is stored on hard disk drives (HDDs) and solid-state drives (SSDs) and accessed and transferred to some external compute, typically a server. Computational storage puts data processing on the drive where the data is stored, enabling the generation of insights and value directly from the data.

What is computational storage and why does it matter?

Computational storage is all about making storage devices smarter to process the data directly where it is stored. This approach reduces the movement of large amounts of data to external processing and delivers myriad benefits, such as reduced latency and bandwidth usage, increased security and energy savings. In other words, data workloads are processed directly on the storage controller itself.

Applying computational storage is critical to address the real-time processing requirements of many machine learning (ML) or analytics applications, and other use cases from IoT to edge computing. In the case of IoT, the acceleration in numbers of deployments will produce huge amounts of raw, unstructured data which is moved to, stored, and processed in a server. However, not all the data captured is relevant.

Let us take an example: A surveillance camera system in a big parking lot records license plate numbers and the time when cars enter and leave to enable billing for parking time, as well as for security purposes. The information of interest is the license plates and it would be highly inefficient to move all of the large images or video streams to the server for image processing whether or not cars are entering or leaving the parking lot.

With computational storage, each camera streams to its local drive and then the compute recognizes the car license plates directly on the drive. Performing ML and image recognition directly on the storage drive and only returning the insight from that data to the server – license plate numbers and the time—is highly efficient. Also, when there is more than one camera in a parking lot, and one drive per camera, then the more cameras, the more drives, and the more compute in the right place. It makes the system more efficient and very scalable.

There are many other use cases where computational storage can have a significant impact. A few common examples:

Database acceleration: operations performed directly on the data.
Off-load: compression/encryption/encoding/deduplication/etc. directly on the data.
Content delivery networks (CDNs): easily enabling very local content delivery.
AI/ML: generating insights directly from the vast amounts of data.
Edge computing: a computational storage drive (CSD) running Linux is a self-contained small server.
Image classification: enables meta-tagging directly on the data where it is stored.
Video: local compute on the large files to generate insights.
Transportation: direct processing of stored telemetry data in a vehicle.

Today with a traditional storage drive, data is moved from the device all the way to the server to be computed, which:

Requires extra time, energy, bandwidth, and latency
Increases the possibility of unauthorized access to the information being moved

If the backhaul, the connection to the servers, in these systems provides limited bandwidth or is expensive, then the benefits of computation storage can reduce total cost of ownership (TCO) significantly. Additional benefits include:

Faster response time and reduced latency

Moving intelligence to where it is needed allows results to be delivered in near real-time. The data does not need to be encapsulated in protocols, then moved and copied through routers and switches, and unpacked on the server before it can be processed.

Reduced energy

No more huge data transfers that require energy and generate heat.

Security and privacy

The data does not leave the drive, only the insight is returned, reducing the risk of leaking information.

Scalability

Since the compute is on the drive, adding more drives means adding more compute where the data is stored.

How does computational storage work?

A CSD is a storage device that provides persistent data storage and computational services. Computational storage is about coupling compute and storage to run applications locally on the data, reducing the processing required on the remote server, and reducing data movement. To do that, a processor on the drive is dedicated to processing the data directly on that drive, which allows the remote host processor to work on other tasks.

In a traditional storage system, the compute wants to do some processing on the data.

The compute requests the data from the storage.
The storage sends the data to the compute.
The compute then does some processing.
The result is moved back to the storage.

A graphic showing a storage system

In a Computational storage system, the compute does not request data.

The compute requests an operation to be carried out on the data by the drive itself.
Processing takes place on the storage device.
The drive returns the result to the compute.

A graphic showing a computational storage system.

Read our guide to computational storage for more insight.

Linux facilitates computational storage

There are multiple ways of implementing computational storage, however, the main requirement is embedding processing capability in the drive controller that can run a rich operating system such as Linux and software components. This has key benefits:

Open source software with a vast Linux developer community

Open source software with a vast Linux developer community and standard tools that are used industry wide make the development experience easier. Being able to create workloads that developers can then deploy using standard systems based on Linux to the drive, still following the SNIA standard, simplifies a system and allows for easier software development.

Readily available tools

With Linux, the vast ecosystem of tools and open source software are available to develop, deploy, and manage computational storage workloads. This enables the developer community to quickly migrate tasks to computational storage drives.

Intelligent storage enabled

In a standard NVMe drive, the drive is sent blocks of data, breaks them up and stores them into pages in its NAND dies. The server asks to be sent a block of data, fetches it from the NAND, reassembles it back into a block, and finally sends it to the host. Still, the drive does not know that these blocks make up a JPEG image, for example, because it does not understand the file system. Instead, Linux running on the drive enables intelligent storage as it can mount the standard file system while the CSD applications can understand what files the blocks of data actually represent and perform actions on the data directly.

The drive as a mini server

Linux running on the drive can manage the drive, develop workloads, and download new workloads all using current standard open source systems. It turns the drive into a mini server at the lowest possible cost.

Now you may wonder: Is Linux really adapted for computational storage? The answer is Yes.

Isn’t it too big? The answer is No.

Storage drives today already have gigabytes of RAM and terabytes of storage and fast compute to handle the massive data movements in and out of the drive. Linux may bring to mind large installations of software for big servers, not adapted to on-device storage and compute, but the requirements for Linux are much smaller compared to a big server. The software can be significantly reduced in size.

With Linux, there is no need for display drivers, several functions are not applicable, and you can simplify it and tailor it to your controller. For example, Debian 9 requires only 512MB of RAM and 2GB of storage.

Administration of the CSD can be performed using the standard open source tools that are used in these complex systems. Workloads can be securely downloaded and managed using common tools such as Kubernetes, Docker or extended Berkeley Packet Filter (eBPF) to enable secure execution of applications or scripts in a secure way.

Arm offers the easiest, fastest, and most cost-effective computational storage solution

The Arm storage solution offers easy, fast, and cost-effective technology, support, and vast ecosystem for successful in computational storage.

Arm processor portfolio

Arm Cortex-A processors are optimized for low power and high performance in complex computing tasks on storage devices. Furthermore, the Cortex-R82 processor is optimized for high-performance real time and high-level operating system applications. These applications-capable processors enable computational storage with:

A memory management unit (MMU) for running Linux
Neon support for ML workloads

Porting and optimization of applications

Arm and partners have ported, and optimized, all the leading Linux distributions and open source applications. This is maintained automatically and runs on any Cortex-A processor or Cortex-R82, without adaptations. Arm has also optimized these applications both internally and through Linaro to make sure everything runs optimally on Arm.

Software ecosystem of tools and libraries

With support from the Arm software ecosystem, programming work is minimal. ML software libraries that run on the Cortex-A processors and Cortex-R82 accelerate the search speed through images or other files.

Arm’s partner, NGD Systems, is exploring the use of computational storage to help airlines improve the analysis of flight data. Today, airlines generate multiple terabytes of telemetry data per hour and offloading and analyzing that data can take hours, which is time operators cannot spare. With computational storage, flight analytics can be provided to the right people at the right time, helping to improve safety in the air.

There are many other, non-Linux, types of compute possible on CSDs. For fixed purpose compute functions such as encryption, compression or deduplication then low-level real-time software, hardware acceleration or neural processing units (NPUs) can all be used in a CSD system. These types of specific functions and accelerators can be built into CSDs and accessed directly through the CSD protocol extensions defined by the industry.

These low-level functions can also be accessed from high-level operating systems where available. The flexibility and ease of customization that a high-level operating system provides, combined with a huge developer community and low-level accelerators can deliver very high-performance and efficient CSD solutions.

The future of Computational Storage is now

There are devices based on Arm processors already available today from multiple partners, together with an industry-wide effort to have all storage developers and players aligned to a common implementation. Arm is actively involved in the SNIA Computational Storage Technical Working Group, working with 45 companies and 202 members to define a standard. This standard will encourage the adoption and development of computational storage as it removes the risk of fragmentation and lack of compatibility.

Visit our computational storage website to learn more about the Arm storage solution.

Talk to an expert

¹1 ZB zettabyte = 1,000,000,000,000,000,000,000 bytes, 1,000 EB exabytes, 1 million PB petabytes, 1 billion TB terabytes

²Source: Data Age 2025. The Digitization of the World From Edge to Core. An IDC White Paper – #US44413318, sponsored by Seagate. https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf

0 comments
0 members are here

Internet of Things (IoT) blog

Building vision-enabled devices to capture the emerging wave in IoT

Diya Soubra

IoT devices will drive an explosion in use cases with vision. Read more about the different use cases and what Arm technology is involved here.
- December 9, 2024
The power of SystemReady for custom-built OS distributions

Pere Garcia

Arm developed the SystemReady Devicetree band as part of the SystemReady program, learn more in this blog post.
- November 22, 2024
Software, Tools, and Ecosystem for ML Edge Devices

Reinhard Keil

Learn how Arm and our Partners enable developers and the IoT software ecosystem to deliver smart, energy efficient ML edge devices.
- July 17, 2024

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Computational Storage is bringing processing closer to the data

What is computational storage and why does it matter?

How does computational storage work?

Linux facilitates computational storage

Arm offers the easiest, fastest, and most cost-effective computational storage solution

The future of Computational Storage is now

Building vision-enabled devices to capture the emerging wave in IoT

The power of SystemReady for custom-built OS distributions

Software, Tools, and Ecosystem for ML Edge Devices