Serverless computing, also known as Function-as-a-Service (FaaS), offers a new compelling paradigm: it enables users to run their code (a small application dedicated to a specific task) without being concerned about operational issues. Under this model, it is the responsibility of a Cloud provider to guarantee the server provisioning and resource management issues. For the last two years, serverless has been the top-growing Cloud service, seeing a 50% growth in 2019 from the previous year [1]. Since the appearance of Amazon Lambda in 2014, numerous Cloud providers have released alternative serverless platforms.
Serverless applications are intended to be event-driven and stateless. For example, FaaS could be instantiated (triggered by the described condition) to execute a predefined function, and shutdown when finished. In a commercial system, a user is charged on a per-invocation basis, without paying for unused or idle resources. The serverless model favors the applications with good parallelism (for example, video-encoding applications, where different frames can be processed concurrently) and devices with intermittent activities, such as data processing triggered by the Edge devices. Therefore, it is a perfect fit for Internet of Things (IoT) environments.
Many novel IoT applications require low-latency data processing and near real-time responses such as in connected and autonomous cars. Imagine a world where cars and related data services can alert drivers about dangerous road conditions, because of their ability to communicate. You might hear, “Black ice on the road in front of you - right lane in 200 meters”. Real-time performance is expected for detection and control in many industrial and enterprise systems. Some scenarios require a response within 10 milliseconds (ms). While Cloud computing provides a good solution for applications designed at human perception speeds, it becomes inadequate for novel latency-critical applications that rely on fast, automated decisions made with no human in the loop. To satisfy the performance requirements of such workloads, we must provide an alternative way of processing the data closer to the source, also known as Edge computing.
Existing Cloud-based serverless frameworks execute the function instances in short-lived Virtual Machines (VMs) or containers, which support application process isolation and resource provisioning. These frameworks are heavyweight for being used in Edge systems. They have a large memory footprint (from 100 MBs up to GBs) and a high function invocation time (125ms to 1sec). There is a lot of unnecessary redundancy and little resource sharing in such deployments as those shown in figure 1 (a,b), where blue colors reflect shared software and parts of the system. Another critical difference, when comparing Cloud and Edge computing, is that Cloud utilizes the "unlimited" computing resources available in multiple data centers. The Edge represents a limited and resource-constrained environment, and therefore Edge resources need to be very carefully managed. In the Edge environment, the long-lived and over-provisioned containers/VMs can quickly exhaust the limited node resources and become impractical for serving many IoT devices. Supporting a high number of serverless functions while providing a low response time, say 10ms, is one of the main performance challenges for resource-constrained Edge computing nodes.
Figure 1: (a) VM-based Serverless (for example, AWS Lambda using Firecrackers, Microsoft Azure Functions using Hyper-V, and so on). (b) Container- based Serverless (OpenWhisk, Google Cloud Functions, and more), (c) Container + Processes-based Serverless (Nuclio), (d) Sledge: Wasm-based Approach for Serverless at the Edge.
WebAssembly (Wasm) is a nascent but fast-evolving technology that provides a strong memory isolation (through sandboxing) with a much smaller memory footprint, compared to VMs and containers. Wasm enables users to write functions in different languages (for example, C, C++, C#, Go, and Rust), which are compiled into a platform-independent bytecode. Wasm runtimes could leverage various hardware and software technologies to provide isolation and manage desirable resource allocations.
Since many existing Wasm compilers and runtimes exhibit significant overheads as compared to the application native execution, we implemented our own LLVM-based ahead-of-time (AoT) Wasm compiler, named aWsm (pronounced “awesome”). It offers configurable sandboxing and is optimized for performance. Several of the latest Wasm papers (written over a period of three years) are devoted to optimizing Wasm compilers and performance of the resulting code. In 2017, only 7 out of 30 PolyBench/C benchmarks performed within 1.1 times of native execution [2]. While by May 2019, 13 benchmarks out of 30 could perform within 1.1 times of native execution, due to improved Wasm compilers [3]. In our paper, though the focus is on the serverless runtime, we demonstrate that the aWsm compiler performs within 1.1 times of the native execution for 24 out of 30 PolyBench/C benchmarks. We evaluated the aWsm compiler and its runtime on x86_64 and AArch64 architectures, showing an average performance overhead for PolyBench/C benchmarks (compared to a native code execution) being within 13% and 7% respectively. Additionally, we have compared our aWsm compiler with various existing LLVM- and Cranelift-based Wasm compilers and runtimes to demonstrate its efficiency and performance. Please, see our ACM/IFIP/USENIX Middleware’2020 paper for these interesting details.
In our work, we propose a new serverless-first infrastructure, Sledge, (ServerLess at the Edge runtime) optimized for properties of low latency serverless execution at the Edge. Toward this, we focus on the serverless runtimes for single host servers ranging from powerful multiprocessor servers to low-cost systems, like Raspberry Pi. A new runtime Sledge enables lightweight function instantiation and isolation facilities. The memory footprint of functions has a significant impact on “cold-start” performance and scalability at the resource-constrained Edge. The single-process Sledge runtime binary size is only 359KB. It enables functions to share the library dependencies, while providing a strong spatial and temporal isolation for multi-tenant functions executions. The AoT compiled shared object sizes are between 108 KB-112 KB. This is significantly smaller than VM- and container-based function isolation, which is often in 10s to 100s of MBs. Our framework enables a lighter weight (30μs) function startup time and efficient management of a high churn of request rates in the Edge systems.
Sledge uses a kernel bypass to optimize the framework efficiency and enable custom (specialized) serverless function scheduling. It leverages the short-lived execution properties of serverless to specialize system scheduling by decoupling both the work-distribution and load balancing across system cores for scalability. The Sledge runtime focuses squarely on efficiency of serverless functions and enables strong spatial and temporal isolation of multi-tenant function executions. These lightweight sandboxes are designed to support high-density computation, with fast startup and teardown times to handle high client request rates. An extensive evaluation of Sledge with varying workloads and real-world serverless applications demonstrates the effectiveness of the designed serverless-first runtime for the Edge. Sledge supports up to 4 times higher throughput and 4 times lower latencies compared to Nuclio, one of the fastest open-source, container-based serverless frameworks.
This demonstrates that a serverless runtime, optimized by leveraging a lightweight Wasm-based isolation and bypass of traditional kernel scheduling, holds a significant promise for demanding requirements of future Edge computing solutions. The proposed framework opens a set of interesting opportunities for customized performance management of users’ serverless functions, which we plan to investigate in our future work.
Moreover, the implemented AoT compiler aWsm leverages the LLVM compiler to optimize code and targets different architectural backends. Using Polybench/C benchmarks, we have evaluated aWsm on x86-64, AArch64 (Raspberry Pi), and Thumb (Arm Cortex-M4 and M7).
aWsm performance on microprocessors is within 10% of native and within 40% on the microcontrollers. To explore aWsm on Cortex-M, please see our paper published at EMSOFT 2020, “eWASM: Practical Software Fault Isolation for Reliable Embedded Devices”.
This work started during P. K. Gadepalli’s summer internship in 2019 with Arm Research and has evolved as a collaborative project with George Washington university.
Read the full paper
Explore the open-sourced aWsm
Questions? Contact me
[1] Flexera 2020 State of the Cloud Report
[2] Andreas Haas, Andreas Rossberg, Derek L. Schuff, Ben L. Titzer, Michael Holman, Dan Gohman, Luke Wagner, Alon Zakai, and JF Bastien. "Bringing the Web Up to Speed with WebAssembly". In Proc. of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’17).
[3] Abhinav Jangda, Bobby Powers, Emery D. Berger, and Arjun Guha "Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code". In USENIX Annual Technical Conference (ATC 19).