High-performance Edge inference with RedisAI

March 26, 2021

17 minute read time.

Redis (Remote Dictionary Server) is one of the most popular in-memory key-value databases for distributed high-speed data storage for use cases like data caching, session management, and Pub/Sub. Redis is lightweight (written in C), performant, and reliable so it can be deployed on different platforms and architectures from cloud applications to small IoT devices. The modular structure of Redis allows the implementation of custom operations over stored data. In addition, putting computation close to the data store reduces latency and enhances the performance.

RedisAI is a Redis module created by Redis Labs, designed for running deep learning graphs over stored data. The growing interest in running Machine Learning applications on small devices makes RedisAI an appropriate solution for edge inference. In this blog, we show an implementation of an image classification system that runs machine learning inference over a stream of data on a NVIDIA Jetson Nano device.

Redis on Arm-based Edge devices

NVIDIA Jetson systems (Nano/TX2/Xavier NX/AGX Xavier) are examples of low-power devices equipped with Arm processors with AI computing capabilities for edge applications. Redis runs efficiently on Arm processors, and all versions are tested on the Arm-based Raspberry Pi platform.

The low memory footprint of Redis makes it an ideal candidate for running on small, resource limited (CPU/memory) devices. In addition, data collected by peripheral devices and sensors are received as streams and must be stored in Redis in the right data structure. Redis Streams data type is implemented to satisfy such requirement for data collection and storage, enabling Redis to handle storage and retrieval of streaming data in IoT frameworks efficiently.

Machine Learning Inference with RedisAI

RedisAI modules allows deep learning inference on the data stored in Redis as tensors. In machine learning, tensors are multi-dimensional arrays and are a supported data type in Redis. RedisAI can load and execute machine learning models over tensors using popular machine learning libraries such as TensorFlow, TensorFlow Lite, PyTorch, and ONNX Runtime.

Figure 1: RedisAI backend libraries and inference flow

You can find detailed information on how to use the Redis command line interface to run machine learning inference in the RedisAI documentation.

TensorFlow vs. TensorFlow Lite models The model format for RedisAI is different for TensorFlow and TensorFlow Lite. While it is possible to feed a TensorFlow model serialized as protocol buffers into RedisAI, TensorFlow Lite models have to be in FlatBuffer format for memory efficiency. Read how to convert a TensorFlow model into FlatbuBuffer in order to load into TensorFlow Lite on TensorFlow website.

Inference on the Edge: NVIDIA Jetson Nano

Machine learning/deep learning applications use algorithms that can be computed in a highly parallel way (for instance, convolution operation in Convolutional Neural Networks). Since GPUs by their nature process in parallel, they are used extensively in deep learning applications. GPUs noticeably decrease the time required for training and inference.

Backend libraries of RedisAI can manage both CPU and GPU inference. Therefore, users can pick one of them based on the device capability that model is executing on, and their preferences. NVIDIA Jetson Nano systems are a well-suited candidate for IoT applications seeking low price devices with GPU-powered capabilities. Jetson Nano systems come with 128-core Maxwell GPUs and Quad-core ARM Cortex A57, and with the choice of 2GB or 4GB of memory.

To show the inference and resource metrics, we run multiple benchmarks against RedisAI running on a Jetson Nano 4GB using AIBench tool. AIBench runs two different benchmarks against RedisAI: vision benchmark (MobileNetV1), and fraud detection benchmark (Kaggle model). We show the performance of RedisAI Inference on Jetson Nano for both CPU and GPU computations using TensorFlow and TensorFlow Lite backends. Notice that on Jetson Nano, the GPU has no dedicated memory and shares the device memory with the CPU.

Benchmark settings

For the benchmarking, we use a single client with different tensor batch sizes to compare inference rate for TensorFlow and TensorFlow Lite on CPU and GPU. Figure 2 shows the inference/sec for vision benchmark for both TensorFlow and TensorFlow Lite inference, which TensorFlow Lite outperforms TensorFlow for larger batch sizes for both CPU and GPU inferences.

Figure 2. TensorFlow and TensorFlow Lite CPU/GPU inference

One of the major limitations of edge devices is the available memory. Hence, choosing the right inference tool can make a huge difference for applications to run smoothly. Benchmarking shows memory consumption of TensorFlow Lite is considerably lower compared to TensorFlow.

Auto-batching: a feature that shouldn’t be missed. One of the important features of RedisAI is auto-batching. In the presence of many clients, sending data for inference (like classification on the images received from two separate cameras connected to a Jetson board), RedisAI is able to batch a specific number of the requests and run inference once. That will enhance the performance of the servers with many incoming requests.

RedisAI inference example

The following is an example in Go, showing how to run RedisAI inference on a stream of data. Both Redis and RedisAI have Go clients (go-redis/redis and RedisAI/redisai-go) that help in efficiently writing an application for managing both streaming and inference on Redis.

The example reads model file graph.pb, which has two input tensors a and b of type float. The model multiplies the inputs and stores the result in tensor c. The inference backend is TensorFlow, and CPU is used as the inference device. The main function creates a goroutine that loads the model, reads the records (containing two numbers) from a Redis stream and runs inference on the numbers. The main routine generates pairs of random numbers and inserts them into the Redis stream, in order to be processed by the inference routine.

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
package main
import (
    "context"
    "fmt"
    "io"
    "log"
    "math/rand"
    "os"
    "strconv"
    "time"
    "github.com/RedisAI/redisai-go/redisai"
    redis "github.com/go-redis/redis/v8"
)
var redisAddress = "localhost:6379"
// model file
var modelFile = "graph.pb"
var inputs = []string{"a", "b"}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

package main

import (
	"context"
	"fmt"
	"io"
	"log"
	"math/rand"
	"os"
	"strconv"
	"time"

	"github.com/RedisAI/redisai-go/redisai"
	redis "github.com/go-redis/redis/v8"
)

var redisAddress = "localhost:6379"

// model file
var modelFile = "graph.pb"
var inputs = []string{"a", "b"}
var outputs = []string{"c"}

var ctx = context.Background()

var numOfRecords = 10
var streamName = "inference_stream"
var modelName = "inference_model"
var inferenceEngine = "TF"
var device = "CPU"

func readModelFile() []byte {
	// read model file
	f, err := os.Open(modelFile)
	if err != nil {
		log.Fatalf("error opening file %s: %q", modelFile, err)
	}
	stats, _ := f.Stat()

	modelData := make([]byte, stats.Size())
	_, err = f.Read(modelData)
	if err != nil {
		if err != io.EOF {
			log.Fatalf("error reading file %s: %q", modelFile, err)
		}
	}
	f.Close()

	return modelData
}

func readStreamValues(s *redis.XStreamSliceCmd) (float32, float32) {
	val, err := s.Result()

	if err != nil {
		log.Fatal(err)
	}

	x, err := strconv.ParseFloat(val[0].Messages[0].Values[inputs[0]].(string), 32)
	if err != nil {
		log.Fatal(err)
	}

	y, err := strconv.ParseFloat(val[0].Messages[0].Values[inputs[1]].(string), 32)
	if err != nil {
		log.Fatal(err)
	}

	return float32(x), float32(y)
}

func inference(rdb *redis.Client) {
	modelData := readModelFile()

	client := redisai.Connect("redis://"+redisAddress, nil)

	// load the model
	err := client.ModelSet(modelName, inferenceEngine, device, modelData, inputs, outputs)
	if err != nil {
		log.Fatalf("error setting model: %q", err)
		return
	}

	// setup data stream to read
	var read = redis.XReadArgs{
		Streams: []string{streamName, "$"},
	}

	var s *redis.XStreamSliceCmd

	// tensor creations and inference has to be sent in a pipeline to Redis
	// in order to make the number of requests small and whole process faster

	for {
		// read from the stream
		s = rdb.XRead(ctx, &read)
		x, y := readStreamValues(s)

		log.Printf("%s: %f, %s, %f\n", inputs[0], x, inputs[1], y)

		// there is 4 operations that can run inside a pipeline
		client.Pipeline(4)

		err = client.TensorSet(inputs[0], redisai.TypeFloat, []int64{1}, []float32{x})
		if err != nil {
			log.Fatal(err)
		}

		err = client.TensorSet(inputs[1], redisai.TypeFloat, []int64{1}, []float32{y})
		if err != nil {
			log.Fatal(err)
		}

		err = client.ModelRun(modelName, inputs, outputs)
		if err != nil {
			log.Fatal(err)
		}

		_, err = client.TensorGet(outputs[0], redisai.TensorContentTypeValues)
		if err != nil {
			log.Fatal(err)
		}

		// ignore TensorSet
		_, err := client.Receive()
		if err != nil {
			log.Fatal(err)
		}

		// ignore TensorSet
		_, err = client.Receive()
		if err != nil {
			log.Fatal(err)
		}

		// ignore ModelRun
		_, err = client.Receive()
		if err != nil {
			log.Fatal(err)
		}

		// get the result of TensorGet
		err, _, _, data := redisai.ProcessTensorGetReply(client.Receive())
		if err != nil {
			log.Fatal(err)
		}

		// disable pipeline
		err = client.DisablePipeline()
		if err != nil {
			log.Fatalf("error disabling the pipeline: %v", err)
		}

		info, err := client.Info(modelName)
		if err != nil {
			log.Fatal(err)
		}

		log.Printf("output: %f", data)
		log.Printf("model run time: %sus\n", info["duration"])
		fmt.Println()
	}
	//	fmt.Println(st)
}

func main() {

	// create a pool of connections (in the case of cncurrency)
	rdb := redis.NewClient(&redis.Options{
		Addr:     redisAddress,
		Password: "",
		DB:       0,
		// it is possible to set the pool size. Default is 10/CPU.
	})

	defer rdb.Close()

	go inference(rdb)

	// data stream configuration
	var data redis.XAddArgs
	data.Stream = streamName
	data.ID = "*"

	// generate different rndom numbers on each run
	rand.Seed(time.Now().UnixNano())

	for i := 0; i < numOfRecords; i++ {
		// create random input
		data.Values = []interface{}{inputs[0], rand.Float32(), inputs[1], rand.Float32()}
		_ = rdb.XAdd(ctx, &data)

		// time gap between
		time.Sleep(100 * time.Millisecond)
	}

	// make sure all the results are collected
	time.Sleep(1 * time.Second)
}

Best Practice: using RedisAI pipelining Each run of the model requires multiple steps to be taken:

Storing values into input tensors
Running the model
Reading the values of output tensors

The number of requests for the above can increase depending on the number of input and output tensors. Therefore, submitting one request per operation would impose high network latency to the application. Using RedisAI pipeline would reduce the latency by batching all the commands in a single request. In addition, it improves the performance in server side by reducing the number of I/O operations.

Redis Streams, Redis Gears, and RedisAI: Building an Edge inference system

As shown in the example above, initializing and synchronizing Redis Streams and RedisAI requires that Redis Gears becomes the programmatic way of gluing Redis components. Using Redis Gears allows users to write process pipelines in Python. Similar to RedisAI, Redis loads Redis Gears as a module and runs it close to the data source. Figure 4 shows the components of Redis Gears and how it interconnects with Redis and internal events. You can read more on how RedisGear works and find different examples on the RedisGears page.

Figure 3: RedisGears components

We will demonstrate a practical image classification case study running on a small device (Jetson Nano) implemented by Redis Streams, Redis Gears, and RedisAI.

Animal Recognition Demo: A case study on Edge inference

The RedisAI team have provided several examples of practical applications. Here, we will show how to build and run the Animal Recognition Demo on Jetson Nano as an image classification system and orchestrate the deployment with K3s, a lightweight Kubernetes distribution that is optimized for Arm.

The case study uses MobileNetV2 image classification model to detect if an image captured by a camera is a cat. The components of the system are:

Redis Server, with Redis Gears and RedisAI modules loaded.
An application that loads the model with parameters and uploads the Gears script.
A frontend that reads the stream of all images and shows them to the user.
A frontend that reads the stream of cat images and shows them to the user.

The repository includes a docker-compose script that builds all the docker images and starts the whole system. However, we will show how to build the images for custom versions of Redis/RedisAI/RedisGears for Jetson boards with Arm processor and use Kubernetes for the deployments.

Initialization script (app/init.py)

The script shows the natural way of building a system with RedisAI and RedisGears. It (1) connects to Redis (2) uploads the model file (3) uploads the Gears script.

The Gears script handles the task flow. It (1) registers with the camera stream (2) adds all the images to the 'all' stream (3) selects one image out of 10 to reduce the rate (4) runs inference on the image (5) filters the cat images (6) adds them to 'cats' stream.

Frontend HTTP servers

Frontend HTTP servers (implemented as WebSocket servers in JavaScript) read the images from the streams ('cats' or 'all', depending on the environment variable 'STREAM') and send them to the browser.

Building the Docker images for Jetson

The following Dockerfile builds a Redis Server docker image with Redis Gears and RedisAI modules included. You can change the versions and enable/disable a specific machine learning backend. For instance, the following Dockerfile only enables TensorFlow and TensorFlow Lite backends and disable the rest. The final image already includes OpenCV for image processing on Redis Gears script.

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
ARG REDIS_VER=6.2.1                                                                         
ARG GEARS_VER=1.2.1                                                                                                                                                                   
ARG AI_VER=1.2.2                                                                                                                                                                      
                                                                                                                                                                                        
ARG OS=L4T                                                                              
                                                                                            
ARG OSNICK=bionic
                                                                                             
# ARCH=x64|arm64v8|arm32v7
ARG ARCH=arm64v8                                                                                                                                                                      
                                                                                           
ARG PACK=0                                                 
ARG TEST=0                                                                
                                                                                            
#----------------------------------------------------------------------------------------------
FROM redisfab/redis:${REDIS_VER}-${ARCH}-${OSNICK} AS redis
FROM redisfab/jetpack:4.4.1-${ARCH}-l4t AS builder                                          
                                                                                            
ARG OS                                                
ARG ARCH       
ARG REDIS_VER                                                                              
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

ARG REDIS_VER=6.2.1                                                                         
ARG GEARS_VER=1.2.1                                                                                                                                                                   
ARG AI_VER=1.2.2                                                                                                                                                                      
                                                                                                                                                                                        
ARG OS=L4T                                                                              
                                                                                            
ARG OSNICK=bionic
                                                                                             
# ARCH=x64|arm64v8|arm32v7
ARG ARCH=arm64v8                                                                                                                                                                      
                                                                                           
ARG PACK=0                                                 
ARG TEST=0                                                                
                                                                                            
#----------------------------------------------------------------------------------------------
FROM redisfab/redis:${REDIS_VER}-${ARCH}-${OSNICK} AS redis
FROM redisfab/jetpack:4.4.1-${ARCH}-l4t AS builder                                          
                                                                                            
ARG OS                                                
ARG ARCH       
ARG REDIS_VER                                                                              
ARG GEARS_VER                                                                                                                                                                         
ARG AI_VER                                                                                                                                                                            
ARG CUDA_VER          
                                                                                            
RUN echo "Building for $${OS} for ${ARCH} [with Redis ${REDIS_VER}]"
                                                                                           
WORKDIR /build                                                            
                                                                                            
RUN apt-get update                                                                     
RUN apt-get install -y locales python3-dev                                                                                                                                            
ENV LANG en_US.UTF-8                                                              
                                                                                                                                                                                      
COPY --from=redis /usr/local/ /usr/local/             
                                                                                           
# build RedisAI                   
RUN  git clone --recursive --depth 1 --branch v${AI_VER} https://github.com/RedisAI/RedisAI.git 
                                                                                            
WORKDIR /build/RedisAI        
                                                                                            
RUN PIP=1 FORCE=1 ./opt/readies/bin/getpy3                                                                                                                                            
RUN ./opt/system-setup.py
                                                                                           
ARG DEPS_ARGS="GPU=1 WITH_PT=0 WITH_ORT=0 WITH_UNIT_TESTS=0"
RUN if [ "$DEPS_ARGS" = "" ]; then ./get_deps.sh; else env $DEPS_ARGS ./get_deps.sh; fi
                                                                                            
ARG BUILD_ARGS="GPU=1 SHOW=1 WITH_PT=0 WITH_ORT=0 WITH_UNIT_TESTS=0"
                                                                                               
RUN bash -c "set -e ;\                                         
    . ./opt/readies/bin/sourced ./profile.d ;\                                                                                                                                        
    make -C opt build $BUILD_ARGS"                                  
                                                                                            
# build RedisGears                                                                                                                                                                    
WORKDIR /build                                                                       
                                                                                            
RUN git clone --recursive --depth 1 --branch v${GEARS_VER} https://github.com/RedisGears/RedisGears.git
                                                                                                                                                                                      
WORKDIR /build/RedisGears                                                               

RUN ./deps/readies/bin/getpy2           
RUN make setup && make fetch && make all
                                                                                           
#----------------------------------------------------------------------------------------------
FROM nvcr.io/nvidia/l4t-base:r32.4.4                           
                                                                                            
ARG ARCH                                                                        
ARG GEARS_VER                                                        
                                                                                            
ENV NVIDIA_VISIBLE_DEVICES all                                                       
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility                                       
                                                                                            
RUN if [ ! -z $(command -v apt-get) ]; then apt-get -qq update; apt-get -q install -y libgomp1 build-essential libatlas-base-dev cmake ; fi               
RUN if [ ! -z $(command -v yum) ]; then yum install -y libgomp; fi                                                                                                                    
                                                                                           
ENV REDIS_MODULES /usr/lib/redis/modules
RUN mkdir -p $REDIS_MODULES/
RUN mkdir /artifacts       
                                                                                            
COPY --from=redis /usr/local/ /usr/local/
COPY --from=builder /build/RedisAI/install-gpu/ $REDIS_MODULES/
COPY --from=builder /build/RedisGears/bin/linux-${ARCH}-release/ $REDIS_MODULES/
COPY --from=builder /build/RedisGears/artifacts/release/ /artifacts/ 
                                                                                            
RUN $REDIS_MODULES/python3_${GEARS_VER}/bin/python3 -m pip install --upgrade pip
RUN $REDIS_MODULES/python3_${GEARS_VER}/bin/python3 -m pip install setuptools==49.2.0
                                                                                                                                                                                                                                                                                                                                                                            
# build numpy from source to use ATLAS library
RUN env LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu:$LD_LIBRARY_PATH $REDIS_MODULES/python3_${GEARS_VER}/bin/python3 -m pip install --no-binary :all: numpy
RUN $REDIS_MODULES/python3_${GEARS_VER}/bin/python3 -m pip install opencv-python imageio

EXPOSE 6379
ENTRYPOINT ["redis-server"]

ENV GEARS_VER ${GEARS_VER}

CMD ["--loadmodule", "/usr/lib/redis/modules/redisai.so", \
     "--loadmodule", "/usr/lib/redis/modules/redisgears.so", \
     "PythonHomeDir", "/usr/lib/redis/modules/python3_$GEARS_VER/", \
     "PythonInstallationDir", "/usr/lib/redis/modules/"]

For the demo, all required Dockerfiles for Jetson devices are included in the repository (with . jetson extension). Clone the repository, and build the docker images using the following commands:

Fullscreen

1
2
3
4
5
6
7
8
git clone https://github.com/RedisGears/AnimalRecognitionDemo.git
cd AnimalRecognitionDemo
cd redis ; docker build -t demo/redis -f Dockerfile.jetson . ; cd ..
cd app ; docker build -t demo/app -f Dockerfile.jetson . ; cd .. 
# frontend image uses the same Dockerfile for aarch64
cd frontend ; docker build -t demo/frontend . ; cd ..
cd camera ; docker build -t demo/camera -f Dockerfile.jetson . ; cd ..
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

git clone https://github.com/RedisGears/AnimalRecognitionDemo.git
cd AnimalRecognitionDemo

cd redis ; docker build -t demo/redis -f Dockerfile.jetson . ; cd ..
cd app ; docker build -t demo/app -f Dockerfile.jetson . ; cd .. 
# frontend image uses the same Dockerfile for aarch64
cd frontend ; docker build -t demo/frontend . ; cd ..
cd camera ; docker build -t demo/camera -f Dockerfile.jetson . ; cd ..

To deploy the application on multiple nodes, you need to make the images accessible by uploading them on a Docker Registry. This can be a private registry running on local cluster, or one of public registries like Docker Hub or Amazon ECR.

Configuring container runtime on NVIDIA Jetson Nano

Jetson Nano is specifically designed to run GPU-powered AI applications. Hence, containerized applications need the permission to access available GPUs to run computations. For this, NVIDIA provides its own runtime on Linux4Tegra operating system (customized Linux for Jetson), which can be used as default docker runtime. Make sure that the NVIDIA container runtime is set as default in the Docker configuration file (/etc/docker/daemon.json), and then restart docker service:

Fullscreen

1
2
3
4
5
6
7
8
9
10
cat /etc/docker/daemon.json 
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

cat /etc/docker/daemon.json 
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Application deployment using Kubernetes

Kubernetes is one of the well-known container orchestration platforms. K3s (by Rancher) is a certified Kubernetes distribution designed for IoT devices and is tuned for Arm processors. This section describes how to run the demo using K3s.

The installation of the K3s API server and the agent is straightforward. Natively, K3S uses containerd as the container runtime to benefit from its lightweight performance. However, it is still possible to configure K3s to use Docker runtime on Jetson Nano:

Fullscreen

1
2
3
4
5
6
7
export K3S_URL=https://<K3S-SERVER-IP>:<K3S-SERVER-PORT>
# find the node token on the server is stored in:
#  /var/lib/rancher/k3s/server/node-token
export K3S_TOKEN=<NODE-TOKEN>
curl -sfL http://get.k3s.io | K3S_URL=$K3S_URL K3S_TOKEN=$K3S_TOKEN sh -s - --docker
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

export K3S_URL=https://<K3S-SERVER-IP>:<K3S-SERVER-PORT>

# find the node token on the server is stored in:
#  /var/lib/rancher/k3s/server/node-token
export K3S_TOKEN=<NODE-TOKEN>

curl -sfL http://get.k3s.io | K3S_URL=$K3S_URL K3S_TOKEN=$K3S_TOKEN sh -s - --docker

Note on server-side configuration

If the server is running behind a gateway (e.g., inside an AWS VPC), the external IP address has to be added to the server certificate. In addition, an external IP address had get advertised to the nodes in the cluster. This can be configured using the following installation script:

Fullscreen

1
curl -sfL https://get.k3s.io | sh -s - --docker --tls-san <EXTERNAL-IP-ADDRESS> --advertise-address <EXTERNAL-IP-ADDRESS>
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

curl -sfL https://get.k3s.io | sh -s - --docker --tls-san <EXTERNAL-IP-ADDRESS> --advertise-address <EXTERNAL-IP-ADDRESS>

System deployment

IoT applications may run on many nodes in a cluster. For instance, a face detection application may run on all the edge devices connected to the cameras around a building, while the same anomaly detection software runs on similar machinery in a manufacturing plant. Therefore, a mechanism is required to manage all the applications with the specification on multiple nodes. In Kubernetes framework, IoT applications can run as DaemonSets, or pods on all the nodes with specific properties. The following YAML file describes the Animal Recognition Application DaemonSet. It runs on all the nodes that have label ‘device=jetson-nano’. The initialization script of the demo runs inside a Kubernetes job, since it has to run once, initialize RedisAI, and then terminate.

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: apps/v1
kind: DaemonSet     
metadata:
  name: animal-ds
  namespace: default
spec:                        
  selector:
    matchLabels:
      app: animal-recognition
  template:                    
    metadata:
      labels:      
        app: animal-recognition
    spec:              
      nodeSelector:
        device: jetson-nano
      hostNetwork: true  
      containers:                    
      - name: redis
        image: demo/redis    
        imagePullPolicy: IfNotPresent
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

apiVersion: apps/v1
kind: DaemonSet     
metadata:
  name: animal-ds
  namespace: default
spec:                        
  selector:
    matchLabels:
      app: animal-recognition
  template:                    
    metadata:
      labels:      
        app: animal-recognition
    spec:              
      nodeSelector:
        device: jetson-nano
      hostNetwork: true  
      containers:                    
      - name: redis
        image: demo/redis    
        imagePullPolicy: IfNotPresent
        ports:              
        - containerPort: 6379        
      - name: weball
        image: demo/frontend 
        imagePullPolicy: IfNotPresent
        ports:        
        - containerPort: 3000
        env:              
        - name: STREAM    
          value: all 
        - name: REDIS_HOST  
          value: localhost           
      - name: webcats
        image: demo/frontend 
        imagePullPolicy: IfNotPresent
        ports:        
        - containerPort: 3001
        env:              
        - name: STREAM    
          value: cats
        - name: REDIS_HOST
          value: localhost
        - name: PORT      
          value: "3001"              
      - name: camera        
        image: demo/camera               
        imagePullPolicy: IfNotPresent
        command: ["python3"]
        args: ["./read_camera_jetson.py"]
        securityContext:
          privileged: true
          allowPrivilegeEscalation: true
        env:         
        - name: ANIMAL          
          value: cat  
        volumeMounts:                 
        - mountPath: /dev/video0
          name: camera
        - mountPath: /tmp/argus_socket
          name: argus
      volumes:             
      - name: camera
        hostPath:
          path: /dev/video0      
      - name: argus
        hostPath:   
          path: /tmp/argus_socket
---      
apiVersion: batch/v1
kind: Job
metadata:  
  name: app
spec:              
  template:                
    spec:              
      nodeSelector:
        device: jetson-nano
      hostNetwork: true
      containers:                    
      - name: app           
        image: demo/app                                     
        imagePullPolicy: IfNotPresent
        command: ["python3"]
        args: ["init.py", "--url", "redis://localhost:6379"]
      restartPolicy: Never                        
  backoffLimit: 10

To deploy the above DaemonSet, first make sure that the node is connected to k3s cluster:

Fullscreen

1
2
3
4
5
$ kubectl get nodes
NAME               STATUS   ROLES                 AGE    VERSION
arm-jetson         Ready    <none>                5h27m  v1.20.4+k3s1
ip-172-31-38-192   Ready    control-plane,master  2d3h   v1.20.4+k3s1
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

$ kubectl get nodes

NAME               STATUS   ROLES                 AGE    VERSION
arm-jetson         Ready    <none>                5h27m  v1.20.4+k3s1
ip-172-31-38-192   Ready    control-plane,master  2d3h   v1.20.4+k3s1

And that you have labeled the node properly using the following command (device=jetson-nano for the demo DaemonSet)

Fullscreen

1
$ kubectl label nodes arm-jetson device=jetson-nano
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

$ kubectl label nodes arm-jetson device=jetson-nano

Apply the DaemonSet configuration (stored in animal-recognition-demo.yaml) in order to deploy it into the cluster:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
$ kubectl apply -f animal-recognition-demo.yaml
kubectl apply -f animal-recognition-demo.yaml 
daemonset.apps/animal-ds created
job.batch/app created
$ kubectl get daemonsets
NAME        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR        AGE
animal-ds   1         1         1       1            1           device=jetson-nano   24s
$ kubectl get pods
NAME              READY   STATUS        RESTARTS   AGE
animal-ds-bgkbd   4/4     Running       0          45s
app-28hf7         0/1     Completed     0          42s
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

$ kubectl apply -f animal-recognition-demo.yaml
kubectl apply -f animal-recognition-demo.yaml 
daemonset.apps/animal-ds created
job.batch/app created

$ kubectl get daemonsets
NAME        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR        AGE
animal-ds   1         1         1       1            1           device=jetson-nano   24s

$ kubectl get pods
NAME              READY   STATUS        RESTARTS   AGE
animal-ds-bgkbd   4/4     Running       0          45s
app-28hf7         0/1     Completed     0          42s

The results above show that the animal-ds-bgkbd pod with 4 containers (Redis, camera, and frontends) is running, and the app-28hf7 pod has completed the initialization job successfully.

Animal Recognition in action

Jetson Nano 4GB has two CSI interfaces to connect cameras to (Jetson Nano 2GB has one CSI interface). A single camera gets accessible through /dev/video0, which is also mounted inside the camera container. The camera container reads images from the device every 0.1 seconds and adds them to the corresponding stream (camera:0), where the gears script reads, resizes, and applies inference on the frames.

The containers in the pods are connected to the host network, therefore they can be accessed similar to a service running directly on the machine. Two frontends can be accessed via

http://<node-IP-address>:3000 for all captured frames
http://<node-IP-address>:3001 for all captured frames with cat image

Figure 4 shows the frontends when capturing a cat image vs dog image.

Conclusion

RedisAI throughput is shown to outperforms other platforms (such as TorchServe, TensorFlow Serving, and common HTTP server) while offering minimum latency. The low memory footprint also makes it suitable to run on edge devices that are specifically designed for machine learning applications.

For our benchmarks, we selected Jetson Nano as a power efficient edge device for AI applications. While TensorFlow performs well on Jetson Nano for models such as MobileNetV1, it is not well suited to run on small devices. TensorFlow Lite is a framework used for IoT and mobile inference. Benchmarking the same model on TensorFlow Lite backend shows it can handle higher inference rates using lower memory compared to TensorFlow.

Try other RedisAI examples

RedisAI has provided several case studies on using RedisAI, Redis Streams and Redis Gears. You can read more about the case studies, codes, and deployments on Redis website.

RedisAI Examples

4 comments
0 members are here

Servers and Cloud Computing blog

How SiteMana scaled real-time visitor ingestion and ML inference by migrating to Arm-based AWS Graviton3

Peter Ma

Migrating to Arm-based AWS Graviton3 improved SiteMana’s scalability, latency, and costs while enabling real-time ML inference at scale.
- July 4, 2025
Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1 Release

Chris Goodyer

In this blog post, we announce the releases of Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1. Explore the new product features, performance highlights and how to get started.
- June 17, 2025
Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

Na Li

This blog explores the performance benefits of RAG and provides pointers for building a RAG application on Arm®︎ Neoverse-based Google Axion Processors for optimized AI workloads.
- April 7, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

High-performance Edge inference with RedisAI

Redis on Arm-based Edge devices

Machine Learning Inference with RedisAI

Inference on the Edge: NVIDIA Jetson Nano

Benchmark settings

RedisAI inference example

Animal Recognition Demo: A case study on Edge inference

Initialization script (app/init.py)

Frontend HTTP servers

Configuring container runtime on NVIDIA Jetson Nano

Application deployment using Kubernetes

System deployment

Animal Recognition in action

Conclusion

Try other RedisAI examples

How SiteMana scaled real-time visitor ingestion and ML inference by migrating to Arm-based AWS Graviton3

Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1 Release

Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors