This is an introduction of how we deploy a Serverless Platform with multiple runtimes on an Arm64 server, after which we conclude with container runtimes and Wasm with some performance evaluation. Hopefully, you can have some general idea on Serverless and how these cloud-native projects work on Arm64 servers.
In this blog, we explain:
According to the CNCF (Cloud Native Computing Foundation) Cloud Native Survey 2020 and 2021 surveys, 30% of respondents (2020) and 39% of respondents (2021) use Serverless technologies in production. Serverless has become widely accepted by users and has become a hot topic on cloud computing.
Serverless does not mean there is no server. It is more like a metaphor, as server-side work, like provisioning, maintaining and scaling the server infrastructure, is done by the cloud provider. Customers or developers cannot perceive the server.
It is based on Microservice. Developers can submit the code of their application. Then Serverless services would do following works
The application can make use of various third-party services that is BaaS, like authentication service, cloud-accessible databases, encryption . In addition, Serverless services usually support event-triggered models. Applications are launched only as needed. When an event triggers application code to run, the cloud provider dynamically allocates resources for the code. Most cloud providers have their own Serverless service, like AWS Lambda, Azure Functions, Google Cloud Functions.
Serverless provides some benefits:
But there are also some drawbacks of Serverless:
These pros and cons make Serverless suitable for stateless, ephemeral, async, parallel workloads. For example, processing data at scale, running interactive web and mobile backends and enabling powerful ML (Machine Learning) insights can be setup as serverless services.
You can find some useful resources about Serverless in the Reference section.
We deployed a Serverless platform on Arm64 server and here is the basic information about the platform.
The following image is the architecture of this Serverless Platform, it contains three layers:
Let us start with the mid-layer, Kubernetes. Kubernetes is an open-source system for automating the deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. With its help, we can easily deploy a cluster that can manage resources of hundreds of servers and run container workload on the cluster. It can also integrate with Container Network Interface (CNI) and Container Storage Interface (CSI) to manage the network and storage.
Knative is an open-source Enterprise-level solution to build Serverless and event-driven applications. Knative integrates Kubernetes, service mesh, message channel and broker, and some other extensions. It puts the serverless concept into practice. It contains two primary components:
Despite the Runc runtime that is supported by default, we integrated another three runtimes into this platform, which are Kata container, gVisor, and WasmEdge. Each runtime has its own adapted areas, like lightweight applications, and a Secure environment. Integrating these runtimes into the serverless platform allows us to have more fixable and well-rounded choices when deploying our application.
Please refer to the official guide to install Kata container, Knative, and Kubernetes. Here we explain more details on how to integrate WasmEdge with Kubernetes
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/ https://github.com/kata-containers/kata-containers/tree/main/docs/install https://knative.dev/docs/install/
The crun project has WasmEdge support baked in. We need to compile it by ourselves.
# Install dependencies $ sudo apt update $ sudo apt install -y make git gcc build-essential pkgconf libtool \ libsystemd-dev libprotobuf-c-dev libcap-dev libseccomp-dev libyajl-dev \ go-md2man libtool autoconf python3 automake # Compile crun $ git clone https://github.com/containers/crun $ cd crun $ ./autogen.sh $ ./configure --with-wasmedge $ make $ sudo make install
Add the following lines to /etc/containerd/config.toml to enable crun runtime.
[plugins] ... [plugins.cri.containerd.runtimes.crun] runtime_type = "io.containerd.runc.v2" pod_annotations = ["*.wasm.*", "wasm.*", "module.wasm.image/*", "*.module.wasm.image", "module.wasm.image/variant.*"] privileged_without_host_devices = false [plugins.cri.containerd.runtimes.crun.options] BinaryName = "/usr/local/bin/crun" # restart containerd $ sudo systemctl restart containerd
$ cat > runtime.yaml <<EOF apiVersion: node.k8s.io/v1 kind: RuntimeClass metadata: name: crun handler: crun EOF kubectl apply runtime.yaml # Verify $ kubectl run -it --rm --restart=Never wasi-demo --image=wasmedge/example-wasi:latest --annotations="module.wasm.image/variant=compat-smart" --overrides='{"kind":"Pod", "apiVersion":"v1", "spec": {"hostNetwork": true, "runtimeClassName": "crun"}}' /wasi_example_main.wasm 50000000 Random number: 1534679888 Random bytes: [88, 170, 82, 181, 231, 47, 31, 34, 195, 243, 134, 247, 211, 145, 28, 30, 162, 127, 234, 208, 213, 192, 205, 141, 83, 161, 121, 206, 214, 163, 196, 141, 158, 96, 137, 151, 49, 172, 88, 234, 195, 137, 44, 152, 7, 130, 41, 33, 85, 144, 197, 25, 104, 236, 201, 91, 210, 17, 59, 248, 80, 164, 19, 10, 46, 116, 182, 111, 112, 239, 140, 16, 6, 249, 89, 176, 55, 6, 41, 62, 236, 132, 72, 70, 170, 7, 248, 176, 209, 218, 214, 160, 110, 93, 232, 175, 124, 199, 33, 144, 2, 147, 219, 236, 255, 95, 47, 15, 95, 192, 239, 63, 157, 103, 250, 200, 85, 237, 44, 119, 98, 211, 163, 26, 157, 248, 24, 0] Printed from wasi: This is from a main function This is from a main function The env vars are as follows. The args are as follows. /wasi_example_main.wasm 50000000 File content is This is in a file pod "wasi-demo" deleted # open runtime class feature gate in knative $ kubectl patch configmap/config-features -n knative-serving --type merge --patch '{"data":{"kubernetes.podspec-runtimeclassname":"enabled"}}'
Chat Bot is a good user case for the Serverless platform as its workload is unpredictable and it is a serverless service.
Here we write a chat bot service which includes frontend and backend. We can choose the desired runtime to run the service. The source code can be found in the branch backend and frontend in https://github.com/zhlhahaha/flask-chatterbot.
Setup dns and make sure we can visit the serverless services via call dns directly. Here is the official guide https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/#configure-dns.
We use dnsmasq to set DNS directly.
// add externel ip for kourier $ kubectl edit services -n kourier-system kourier spec: allocateLoadBalancerNodePorts: true clusterIP: 10.105.58.134 clusterIPs: - 10.105.58.134 externalIPs: - 192.168.100.100 externalTrafficPolicy: Cluster internalTrafficPolicy: Cluster // config the dnsmasq $ cat /etc/NetworkManager/NetworkManager.conf [main] plugins=ifupdown,keyfile dns=dnsmasq [ifupdown] managed=false [device] wifi.scan-rand-mac-address=no $ sudo rm /etc/resolv.conf ; sudo ln -s /var/run/NetworkManager/resolv.conf /etc/resolv.conf $ echo 'address=/.knative.example.com/192.168.100.10' | sudo tee /etc/NetworkManager/dnsmasq.d/knative.example.com-wildcard.conf $ sudo systemctl reload NetworkManager // verify if dnsmasq works $ dig knative.example.com +short 192.168.100.100 // setup the auto-generate serverless service url $ kubectl patch configmap/config-domain \ --namespace knative-serving \ --type merge \ --patch '{"data":{"knative.example.com":""}}'
Here we deploy three serverless services for runc, gVisor, and Kata. It will auto generate three URLs as follows.
$ kubectl apply -f services.yaml $ cat services.yaml apiVersion: serving.knative.dev/v1 kind: Service metadata: name: chatterbot-runc spec: template: spec: timeoutSeconds: 10 containers: - image: zhlhahaha/flask-chatterbot:runc command: ['python', 'app.py'] ports: - containerPort: 5000 --- apiVersion: serving.knative.dev/v1 kind: Service metadata: name: chatterbot-gvisor spec: template: spec: runtimeClassName: gvisor timeoutSeconds: 10 containers: - image: zhlhahaha/flask-chatterbot:gvisor command: ['python', 'app.py'] ports: - containerPort: 5000 --- apiVersion: serving.knative.dev/v1 kind: Service metadata: name: chatterbot-kata spec: template: spec: runtimeClassName: kata timeoutSeconds: 10 containers: - image: zhlhahaha/flask-chatterbot:kata command: ['python', 'app.py'] ports: - containerPort: 5000 --- $ kubectl get ksvc NAME URL LATESTCREATED LATESTREADY READY REASON chatterbot-gvisor http://chatterbot-gvisor.default.knative.example.com chatterbot-gvisor-00001 chatterbot-gvisor-00001 True chatterbot-kata http://chatterbot-kata.default.knative.example.com chatterbot-kata-00001 chatterbot-kata-00001 True chatterbot-runc http://chatterbot-runc.default.knative.example.com chatterbot-runc-00001 chatterbot-runc-00001 True // verify if the services works $ curl http://chatterbot-runc.default.knative.example.com/version version1
As the dnsmasq only works for local environment, the frontend service and web browser are supposed to run in the same machine with Serverless backend server.
// start the frontend service $ git clone https://github.com/zhlhahaha/flask-chatterbot $ git checkout frontend $ pip install requirements.txt $ python app.py * Serving Flask app "app" (lazy loading) * Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Debug mode: off * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit) // Use web brower to visit the web page. $ chromium-browser --no-sandbox // visit the webpage http://localhost:5000
Here is an demo of the Serverless Platform on Arm64:
All these features have been verified on the Arm64 Serverless platform.
Autoscaling allows the cluster dynamically adjusts the number of pods according to the load. For example, if the amount of service requests is getting larger, Knative would auto scale up the number of service pods to manage requests. As Knative has integrated with autoscaling service, we can just setup the autoscaling rules in the service config file. Here is an example to show you how simple it is.
metadata: annotations: # Knative concurrency-based autoscaling (default). autoscaling.knative.dev/class: kpa.autoscaling.knative.dev autoscaling.knative.dev/metric: concurrency # Target 10 requests in-flight per pod. autoscaling.knative.dev/target: "10"
As you can see in the previous configuration, and when the number of requests goes to a service pod is larger than 10, it will auto create a pod to handle requests. Also, Knative support scale down to 0, which means there are no resources consumed when no request comes.
As Knative uses service mesh to manage the network, it allows Knative to have more precise traffic control. Traffic splitting is useful for blue/green deployments and canary deployments. Each time we update a serverless service, the service would have version tags. And we can split traffic to a different version of the service. In the following case, the chatterbot service has two revisions, and we can split 70% of traffic to chatterbot-00001 and 30% of traffic to chatterbot-00002.
$ kubectl get revisions NAME CONFIG NAME K8S SERVICE NAME GENERATION READY REASON ACTUAL REPLICAS DESIRED REPLICAS chatterbot-00001 chatterbot 1 True 0 0 chatterbot-00002 chatterbot 2 True 0 0 // then we can split traffic to different revision of chatterbot traffic: - tag: current revisionName: chatterbot-00001 percent: 70 - tag: latest revisionName: chatterbot-00002 percent: 30
Without the Knative, developers need a wide range of networking knowledge and complex setup to make traffic split work.
Flows allow users to compose several services into a sequence or compose several sequences into a series in an easy way like the following images.
Here is the configuration for a sequence flow. We put the three Knative services into a sequence, first-runc, second-kata, and third-wasm. The output of the first service would be the input for the second service. And the same thing happens for the second service and third service. apiVersion: flows.knative.dev/v1 kind: Sequence metadata: name: sequence spec: channelTemplate: apiVersion: messaging.knative.dev/v1 kind: InMemoryChannel steps: - ref: apiVersion: serving.knative.dev/v1 kind: Service name: first-runc - ref: apiVersion: serving.knative.dev/v1 kind: Service name: second-kata - ref: apiVersion: serving.knative.dev/v1 kind: Service name: third-wasm reply: ref: kind: Service apiVersion: serving.knative.dev/v1 name: event-display
apiVersion: flows.knative.dev/v1 kind: Sequence metadata: name: sequence spec: channelTemplate: apiVersion: messaging.knative.dev/v1 kind: InMemoryChannel steps: - ref: apiVersion: serving.knative.dev/v1 kind: Service name: first-runc - ref: apiVersion: serving.knative.dev/v1 kind: Service name: second-kata - ref: apiVersion: serving.knative.dev/v1 kind: Service name: third-wasm reply: ref: kind: Service apiVersion: serving.knative.dev/v1 name: event-display
Besides the sequence's tasks, it also has parallel tasks. We can split the service into different branches, and once the flow is called. Branches would parallelly run.
Knative Eventing provides you with helpful tools that can be used to create event-driven applications, by easily attaching event sources, triggers, and other options to your Knative Services.
For more details, you can refer to https://knative.dev/docs/getting-started/getting-started-eventing/
This blog introduces Serverless and its status. Serverless technology does facilitate software deployment and becomes increasingly wildly accepted by customers. We also show how to deploy a Serverless platform on Arm64 server. The platform integrates with multiple container runtimes. As different runtimes have their own specialized area, customers can choose the most appropriate runtime based on their application. In the end, we show some useful practices on the Serverless platform. Developers could easily build the Serverless framework using open-source components which are readily available on Arm platform.
CNCF cloud native survey 2020 - https://www.cncf.io/reports/cloud-native-survey-2020/ CNCF cloud native survey 2021 - https://www.cncf.io/wp-content/uploads/2022/02/CNCF-AR_FINAL-edits-15.2.21.pdf Use Kubeadm create a Kubernetes cluster - https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/ Kata container deployment - https://github.com/kata-containers/kata-containers/tree/main/docs/install Knative deployment - https://knative.dev/docs/install/ Knative Eventing - https://knative.dev/docs/getting-started/getting-started-eventing/ What is Serverless - https://www.redhat.com/en/topics/cloud-native-apps/what-is-serverless