Building a Serverless Platform on Arm64

May 16, 2023

12 minute read time.

This is an introduction of how we deploy a Serverless Platform with multiple runtimes on an Arm64 server, after which we conclude with container runtimes and Wasm with some performance evaluation. Hopefully, you can have some general idea on Serverless and how these cloud-native projects work on Arm64 servers.

In this blog, we explain:

An overview of Serverless
Serverless on Arm
- Deployment
- Kubernetes with WasmEdge
User cases
- ChatBot
- More useful features
Summary

An overview of Serverless

According to the CNCF (Cloud Native Computing Foundation) Cloud Native Survey 2020 and 2021 surveys, 30% of respondents (2020) and 39% of respondents (2021) use Serverless technologies in production. Serverless has become widely accepted by users and has become a hot topic on cloud computing.

Serverless does not mean there is no server. It is more like a metaphor, as server-side work, like provisioning, maintaining and scaling the server infrastructure, is done by the cloud provider. Customers or developers cannot perceive the server.

It is based on Microservice. Developers can submit the code of their application. Then Serverless services would do following works

Compile the application code
Build an image
Deploy the application
Setup network
Auto-generate a web accessible API for the application

The application can make use of various third-party services that is BaaS, like authentication service, cloud-accessible databases, encryption . In addition, Serverless services usually support event-triggered models. Applications are launched only as needed. When an event triggers application code to run, the cloud provider dynamically allocates resources for the code. Most cloud providers have their own Serverless service, like AWS Lambda, Azure Functions, Google Cloud Functions.

Serverless provides some benefits:

Lower complexity and less DevOps work.
Pay as you go: Pay just for executions’ accumulated time. And lower costs compared to traditional setup.

Faster development: less configuration is needed.
Flexibility: Serverless models auto scale without customers’ intervention.
Self-maintenance: Clients do not need to perform server maintenance.

But there are also some drawbacks of Serverless:

Slow cold starts

High latency
Hard to debug
Data synchronization is a problem.

These pros and cons make Serverless suitable for stateless, ephemeral, async, parallel workloads. For example, processing data at scale, running interactive web and mobile backends and enabling powerful ML (Machine Learning) insights can be setup as serverless services.

You can find some useful resources about Serverless in the Reference section.

Serverless on Arm

We deployed a Serverless platform on Arm64 server and here is the basic information about the platform.

CPU	Ampere^TM Altra^TM Processor
Linux kernel	5.17.0
Kubernetes	v1.25.0
Knative	v1.7.1

The following image is the architecture of this Serverless Platform, it contains three layers:

Knative
Kubernetes
Runtimes

Serverless Platform Architecture

Let us start with the mid-layer, Kubernetes. Kubernetes is an open-source system for automating the deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. With its help, we can easily deploy a cluster that can manage resources of hundreds of servers and run container workload on the cluster. It can also integrate with Container Network Interface (CNI) and Container Storage Interface (CSI) to manage the network and storage.

Knative is an open-source Enterprise-level solution to build Serverless and event-driven applications. Knative integrates Kubernetes, service mesh, message channel and broker, and some other extensions. It puts the serverless concept into practice. It contains two primary components:

Serving: Enables rapid deployment and automatic scaling of containers through a request-driven model for serving workloads based on demand.

Eventing: An infrastructure for consuming and producing events to stimulate apps. Apps can be triggered by various sources, such as events from your own apps, and cloud services from multiple providers.

Despite the Runc runtime that is supported by default, we integrated another three runtimes into this platform, which are Kata container, gVisor, and WasmEdge. Each runtime has its own adapted areas, like lightweight applications, and a Secure environment. Integrating these runtimes into the serverless platform allows us to have more fixable and well-rounded choices when deploying our application.

Deployment example using Kubernetes with Kata and Knative

Please refer to the official guide to install Kata container, Knative, and Kubernetes. Here we explain more details on how to integrate WasmEdge with Kubernetes

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
https://github.com/kata-containers/kata-containers/tree/main/docs/install
https://knative.dev/docs/install/

Kubernetes with WasmEdge

Compile crun
Setting containerd to support crun runtime
Add crun runtime into the Kubernetes cluster

Compile Crun

The crun project has WasmEdge support baked in. We need to compile it by ourselves.

# Install dependencies 

$ sudo apt update 

$ sudo apt install -y make git gcc build-essential pkgconf libtool \ 

    libsystemd-dev libprotobuf-c-dev libcap-dev libseccomp-dev libyajl-dev \ 

    go-md2man libtool autoconf python3 automake 

# Compile crun 

$ git clone https://github.com/containers/crun 

$ cd crun 

$ ./autogen.sh 

$ ./configure --with-wasmedge 

$ make 

$ sudo make install

Setting Containerd

Add the following lines to /etc/containerd/config.toml to enable crun runtime.

[plugins] 

... 

        [plugins.cri.containerd.runtimes.crun] 

           runtime_type = "io.containerd.runc.v2" 

           pod_annotations = ["*.wasm.*", "wasm.*", "module.wasm.image/*", "*.module.wasm.image", "module.wasm.image/variant.*"] 

           privileged_without_host_devices = false 

           [plugins.cri.containerd.runtimes.crun.options] 

             BinaryName = "/usr/local/bin/crun" 

 

# restart containerd 

$ sudo systemctl restart containerd

Add Crun Runtime to Kubernetes Cluster

$ cat > runtime.yaml <<EOF 

apiVersion: node.k8s.io/v1 

kind: RuntimeClass 

metadata: 

  name: crun 

handler: crun 

EOF 

kubectl apply runtime.yaml 

 

# Verify 

$ kubectl run -it --rm --restart=Never wasi-demo --image=wasmedge/example-wasi:latest --annotations="module.wasm.image/variant=compat-smart" --overrides='{"kind":"Pod", "apiVersion":"v1", "spec": {"hostNetwork": true, "runtimeClassName": "crun"}}' /wasi_example_main.wasm 50000000 

Random number: 1534679888 

Random bytes: [88, 170, 82, 181, 231, 47, 31, 34, 195, 243, 134, 247, 211, 145, 28, 30, 162, 127, 234, 208, 213, 192, 205, 141, 83, 161, 121, 206, 214, 163, 196, 141, 158, 96, 137, 151, 49, 172, 88, 234, 195, 137, 44, 152, 7, 130, 41, 33, 85, 144, 197, 25, 104, 236, 201, 91, 210, 17, 59, 248, 80, 164, 19, 10, 46, 116, 182, 111, 112, 239, 140, 16, 6, 249, 89, 176, 55, 6, 41, 62, 236, 132, 72, 70, 170, 7, 248, 176, 209, 218, 214, 160, 110, 93, 232, 175, 124, 199, 33, 144, 2, 147, 219, 236, 255, 95, 47, 15, 95, 192, 239, 63, 157, 103, 250, 200, 85, 237, 44, 119, 98, 211, 163, 26, 157, 248, 24, 0] 

Printed from wasi: This is from a main function 

This is from a main function 

The env vars are as follows. 

The args are as follows. 

/wasi_example_main.wasm 

50000000 

File content is This is in a file 

pod "wasi-demo" deleted 

 

# open runtime class feature gate in knative 

$ kubectl patch configmap/config-features -n knative-serving --type merge --patch '{"data":{"kubernetes.podspec-runtimeclassname":"enabled"}}'

User cases

Chat Bot

Chat Bot is a good user case for the Serverless platform as its workload is unpredictable and it is a serverless service.

Here we write a chat bot service which includes frontend and backend. We can choose the desired runtime to run the service. The source code can be found in the branch backend and frontend in https://github.com/zhlhahaha/flask-chatterbot.

Setup the DNS

Setup dns and make sure we can visit the serverless services via call dns directly. Here is the official guide https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/#configure-dns.

We use dnsmasq to set DNS directly.

// add externel ip for kourier 

$ kubectl edit services -n kourier-system kourier 

spec: 

  allocateLoadBalancerNodePorts: true 

  clusterIP: 10.105.58.134 

  clusterIPs: 

  - 10.105.58.134 

  externalIPs: 

  - 192.168.100.100 

  externalTrafficPolicy: Cluster 

  internalTrafficPolicy: Cluster 

 

// config the dnsmasq 

$ cat /etc/NetworkManager/NetworkManager.conf 

[main] 

plugins=ifupdown,keyfile 

dns=dnsmasq 

 

[ifupdown] 

managed=false 

 

[device] 

wifi.scan-rand-mac-address=no 

 

$ sudo rm /etc/resolv.conf ; sudo ln -s /var/run/NetworkManager/resolv.conf /etc/resolv.conf 

$ echo 'address=/.knative.example.com/192.168.100.10' | sudo tee /etc/NetworkManager/dnsmasq.d/knative.example.com-wildcard.conf 

$ sudo systemctl reload NetworkManager 

 

// verify if dnsmasq works 

$ dig knative.example.com +short 

192.168.100.100 

 

// setup the auto-generate serverless service url 

$ kubectl patch configmap/config-domain \ 

  --namespace knative-serving \ 

  --type merge \ 

  --patch '{"data":{"knative.example.com":""}}'

Deploy Chatbot Serverless Services

Here we deploy three serverless services for runc, gVisor, and Kata. It will auto generate three URLs as follows.

$ kubectl apply -f services.yaml 

$ cat services.yaml 

apiVersion: serving.knative.dev/v1 

kind: Service 

metadata: 

  name: chatterbot-runc 

spec: 

  template: 

    spec: 

      timeoutSeconds: 10 

      containers: 

      - image: zhlhahaha/flask-chatterbot:runc 

        command: ['python', 'app.py'] 

        ports: 

        - containerPort: 5000 

--- 

apiVersion: serving.knative.dev/v1 

kind: Service 

metadata: 

  name: chatterbot-gvisor 

spec: 

  template: 

    spec: 

      runtimeClassName: gvisor 

      timeoutSeconds: 10 

      containers: 

      - image: zhlhahaha/flask-chatterbot:gvisor 

        command: ['python', 'app.py'] 

        ports: 

        - containerPort: 5000 

--- 

apiVersion: serving.knative.dev/v1 

kind: Service 

metadata: 

  name: chatterbot-kata 

spec: 

  template: 

    spec: 

      runtimeClassName: kata 

      timeoutSeconds: 10 

      containers: 

      - image: zhlhahaha/flask-chatterbot:kata 

        command: ['python', 'app.py'] 

        ports: 

        - containerPort: 5000 

--- 

$ kubectl get ksvc 

NAME                   URL                                                       LATESTCREATED                LATESTREADY                  READY   REASON 

chatterbot-gvisor      http://chatterbot-gvisor.default.knative.example.com      chatterbot-gvisor-00001      chatterbot-gvisor-00001      True 

chatterbot-kata        http://chatterbot-kata.default.knative.example.com        chatterbot-kata-00001        chatterbot-kata-00001        True 

chatterbot-runc        http://chatterbot-runc.default.knative.example.com        chatterbot-runc-00001        chatterbot-runc-00001        True 

 

// verify if the services works 

$ curl http://chatterbot-runc.default.knative.example.com/version 

version1

Start Front End

As the dnsmasq only works for local environment, the frontend service and web browser are supposed to run in the same machine with Serverless backend server.

// start the frontend service 

$ git clone https://github.com/zhlhahaha/flask-chatterbot 

$ git checkout frontend 

$ pip install requirements.txt 

$ python app.py 

 * Serving Flask app "app" (lazy loading) 

 * Environment: production 

   WARNING: This is a development server. Do not use it in a production deployment. 

   Use a production WSGI server instead. 

 * Debug mode: off 

 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit) 

 

// Use web brower to visit the web page. 

$ chromium-browser --no-sandbox 

 

// visit the webpage 

http://localhost:5000

Here is an demo of the Serverless Platform on Arm64:

More useful features

All these features have been verified on the Arm64 Serverless platform.

Autoscaling

Autoscaling allows the cluster dynamically adjusts the number of pods according to the load. For example, if the amount of service requests is getting larger, Knative would auto scale up the number of service pods to manage requests. As Knative has integrated with autoscaling service, we can just setup the autoscaling rules in the service config file. Here is an example to show you how simple it is.

    metadata: 

      annotations: 

        # Knative concurrency-based autoscaling (default). 

        autoscaling.knative.dev/class: kpa.autoscaling.knative.dev 

        autoscaling.knative.dev/metric: concurrency 

        # Target 10 requests in-flight per pod. 

        autoscaling.knative.dev/target: "10"

As you can see in the previous configuration, and when the number of requests goes to a service pod is larger than 10, it will auto create a pod to handle requests. Also, Knative support scale down to 0, which means there are no resources consumed when no request comes.

Traffic Split

As Knative uses service mesh to manage the network, it allows Knative to have more precise traffic control. Traffic splitting is useful for blue/green deployments and canary deployments. Each time we update a serverless service, the service would have version tags. And we can split traffic to a different version of the service. In the following case, the chatterbot service has two revisions, and we can split 70% of traffic to chatterbot-00001 and 30% of traffic to chatterbot-00002.

$ kubectl get revisions 

NAME                         CONFIG NAME            K8S SERVICE NAME   GENERATION   READY   REASON   ACTUAL REPLICAS   DESIRED REPLICAS 

chatterbot-00001             chatterbot                                1            True             0                 0 

chatterbot-00002             chatterbot                                2            True             0                 0 

  

// then we can split traffic to different revision of chatterbot 

 traffic: 

    - tag: current 

      revisionName: chatterbot-00001 

      percent: 70 

    - tag: latest 

      revisionName: chatterbot-00002 

      percent: 30

Without the Knative, developers need a wide range of networking knowledge and complex setup to make traffic split work.

Flows

Flows allow users to compose several services into a sequence or compose several sequences into a series in an easy way like the following images.

Knative diagram

knative sequence serial

Here is the configuration for a sequence flow. We put the three Knative services into a sequence, first-runc, second-kata, and third-wasm. The output of the first service would be the input for the second service. And the same thing happens for the second service and third service.

apiVersion: flows.knative.dev/v1 

kind: Sequence 

metadata: 

  name: sequence 

spec: 

  channelTemplate: 

    apiVersion: messaging.knative.dev/v1 

    kind: InMemoryChannel 

  steps: 

    - ref: 

        apiVersion: serving.knative.dev/v1 

        kind: Service 

        name: first-runc 

    - ref: 

        apiVersion: serving.knative.dev/v1 

        kind: Service 

        name: second-kata 

    - ref: 

        apiVersion: serving.knative.dev/v1 

        kind: Service 

        name: third-wasm 

  reply: 

    ref: 

      kind: Service 

      apiVersion: serving.knative.dev/v1 

      name: event-display

Besides the sequence's tasks, it also has parallel tasks. We can split the service into different branches, and once the flow is called. Branches would parallelly run.

Eventing triggering

Knative Eventing provides you with helpful tools that can be used to create event-driven applications, by easily attaching event sources, triggers, and other options to your Knative Services.

For more details, you can refer to https://knative.dev/docs/getting-started/getting-started-eventing/

Summary

This blog introduces Serverless and its status. Serverless technology does facilitate software deployment and becomes increasingly wildly accepted by customers. We also show how to deploy a Serverless platform on Arm64 server. The platform integrates with multiple container runtimes. As different runtimes have their own specialized area, customers can choose the most appropriate runtime based on their application. In the end, we show some useful practices on the Serverless platform. Developers could easily build the Serverless framework using open-source components which are readily available on Arm platform.

References

^{CNCF cloud native survey 2020 - https://www.cncf.io/reports/cloud-native-survey-2020/}
^{CNCF cloud native survey 2021 - https://www.cncf.io/wp-content/uploads/2022/02/CNCF-AR_FINAL-edits-15.2.21.pdf}
^{Use Kubeadm create a Kubernetes cluster - https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/}
^{Kata container deployment - https://github.com/kata-containers/kata-containers/tree/main/docs/install}
^{Knative deployment - https://knative.dev/docs/install/}
^{Knative Eventing - https://knative.dev/docs/getting-started/getting-started-eventing/}
^{What is Serverless - https://www.redhat.com/en/topics/cloud-native-apps/what-is-serverless}

0 comments
0 members are here

Servers and Cloud Computing blog

Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

Na Li

This blog explores the performance benefits of RAG and provides pointers for building a RAG application on Arm®︎ Neoverse-based Google Axion Processors for optimized AI workloads.
- April 7, 2025
Arm CMN S3: Driving CXL storage innovation

John Xavier Lionel

CXL are revolutionizing the storage landscape. Neoverse CMN S3 plays a pivotal role in enabling high-performance, scalable storage devices configured as CXL Type 1 and Type 3.
- February 24, 2025
Streamline Arm adoption with GitHub Copilot and Arm64 Runners

Michael Gamble

The Arm for GitHub Copilot extension is here to change the way developers approach architecture migration.
- February 19, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog