The decreasing cost and power consumption of intelligent, interconnected, and interactive devices at the edge of the Internet are creating massive opportunities to instrument our cities, factories, farms, and environment to improve efficiency, safety, and productivity. Developing, debugging, deploying, and securing software for the estimated trillion connected devices presents substantial challenges. As part of the SMARTER (Secure Municipal, Agricultural, Rural, and Telco Edge Research) project, Arm has been exploring the use of cloud-native technology and methodologies in edge environments to evaluate their effectiveness at addressing these problems at scale. This blog is part of a series, read the previous blog to find out more about SMARTER.
Read the previous SMARTER blogs
In the past few years, decentralizing applications from data centers to machines closer to where valuable data is collected has become a catalyst for rethinking the way we manage application life cycles. One logical approach to enabling a seamless transition from cloud to edge is to take existing application orchestration models popular in the cloud, and tweak them such that they work transparently for the edge. Two of the biggest players in the cloud space, Docker and Kubernetes, make the development and deployment of highly distributed applications much less of a headache for the common developer. Given the success of these tools in the cloud, there now exists a push to use this same model to also manage applications running at the edge, making for an even more challenging distributed system problem.
In the cloud space, it can seem like everyone and their brother has a solution for APM (Application Performance Monitoring) and observability. For some perspective on the scale of number of existing solutions, I found an interesting site, OpenApm which gives a nice overview of popular open-source tools used for APM within the community. Given the saturation of this market, I set out to select a stack which maps well to the edge, where we make the following assumptions about the machines:
This post describes how to set up your own edge computing playground with APM and observability build in from the ground up. Currently, there exists no single cluster environment which manages both the cloud and edge portions of your system transparently, so for now we manage our cloud and edge using two logically independent control planes. We perform the following to set up on our infrastructure:
System Architecture Overview
Before you begin setting up the infrastructure in this guide, go ahead and clone the repository by running the following:
git clone https://gitlab.com/arm-research/smarter/edge-observability-apm
The instructions in this guide assume will assume you issue commands from the base of the cloned repository.
Create a bare-metal, single node 1.17 Kubernetes x86 cluster setup using the k3s installation convenience script, with Flannel as the cluster CNI (Container Networking Interface) and RBAC enabled. Setting up your own bare-metal cluster avoids having to spend money on managed Kubernetes services like Amazon EKS or Google Kubernetes Engine.
To install k3s simply run:
export THIS_HOST_IP=$(hostname -I | awk '{print $1;}') curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.17.2+k3s1 sh -s - --write-kubeconfig-mode 664 --bind-address $THIS_HOST_IP --advertise-address $THIS_HOST_IP --no-deploy servicelb --no-deploy traefik
Here I ask that you setup a dev machine which will have your kubeconfig file generated during your cluster bring up, such that you will be able to run kubectl commands. To do a quick check that you have done everything properly, run kubectl get all and make sure that you get a valid response back from your new cluster's API server. For k3s, you can fetch your cluster kubeconfig from the directory /etc/rancher/k3s/k3s.yaml on your master.
kubectl
kubectl get all
/etc/rancher/k3s/k3s.yaml
The "package manager" for Kubernetes, Helm makes the deployment of complex applications composed of many Kubernetes objects easier. For many of the APM and observability tools used in this guide, I opted to use Helm 2. To install Helm 2, you should follow the following instructions on your Linux dev machine:
curl -fsSl https://raw.githubusercontent.com/helm/helm/master/scripts/get -o get-helm2.sh sudo bash get-helm2.sh kubectl create serviceaccount --namespace kube-system tiller kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller helm init --service-account tiller --wait
Our load balancer sits in front of our cluster and balances incoming traffic to our internal cluster services. If you use a managed Kubernetes service like Amazon EKS, they will generally handle load-balancing and the assignment of static IPs for you, however, in the case of our bare-metal cluster, we must have control over the network the nodes live in, such that we can reserve a range of IPs to be allocated for our load balancer. The tool we use to set up load-balancing is MetalLB.
To install the load-balancer, run the following from your dev machine:
kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.8.3/manifests/metallb.yaml
Now you must create a config map for MetalLB to give it control over a specific set of internal IPs. Export $HOST_IP in your dev machine environment and apply the config to your cluster by running:
export HOST_IP=<YOUR_MASTER_IP> envsubst '${HOST_IP}' < cloud/metal-lb/metalconfig.yaml > cloud/metal-lb/metalconfig-custom.yaml kubectl apply -f cloud/metal-lb/metalconfig-custom.yaml
Now with MetalLB installed, we need to configure a reverse proxy server which is responsible for actually handling the ingress traffic into our cluster either from our edge devices or any authorized user, who wishes to view collected data with a web-ui for instance. The responsibility of this component is to configure our HTTP load balancer (MetalLB) according to the Ingress API objects created by users of the cluster. To do this we install nginx-ingress.
From the root of the repository run:
helm repo update helm install stable/nginx-ingress --name my-nginx -f cloud/nginx-ingress/nginx-values.yaml --set rbac.create=true
To encourage best practices when working with exposed cluster ingress endpoints, I have opted to include cert manager in this example project. Cert manager makes TLS security very easy through the custom resource definitions it provides for certificate generation. We generate self-signed certificates in this tutorial to secure our endpoints. To install into your cluster run:
kubectl apply --validate=false -f https://raw.githubusercontent.com/jetstack/cert-manager/v0.13.0/deploy/manifests/00-crds.yaml kubectl create namespace cert-manager helm repo add jetstack https://charts.jetstack.io helm repo update helm install \ --wait \ --timeout 500 \ -f cloud/cert-manager/cert-manager-values-local.yaml \ --name cert-manager \ --namespace cert-manager \ --version v0.13.0 \ jetstack/cert-manager kubectl apply -f cloud/cert-manager/selfsigned-issuer.yaml
In order to configure helm chart values for your environment before the deploying our apps, export the following variable on your dev machine:
export SMARTER_DATA_DOMAIN=<YOUR_MASTER_IP(dash separated)>.nip.io
Very popular tools within the APM and observability space, the Elastic Stack (ELK), provides a data ingestion and visualization solution for the cloud space. For the edge, we are sending our node and application performance metrics up to our cluster where they will be stored in our distributed Elasticsearch DB instance, and visualized using Kibana.
Make sure you have docker installed on your dev machine, and run, to create our Elasticsearch and Kibana credentials:
docker rm -f elastic-helm-charts-certs || true rm -f elastic-certificates.p12 elastic-certificate.pem elastic-stack-ca.p12 || true password=$([ ! -z "$ELASTIC_PASSWORD" ] && echo $ELASTIC_PASSWORD || echo $(docker run --rm docker.elastic.co/elasticsearch/elasticsearch:7.6.1 /bin/sh -c "< /dev/urandom tr -cd '[:alnum:]' | head -c20")) && \ docker run --name elastic-helm-charts-certs -i -w /app \ docker.elastic.co/elasticsearch/elasticsearch:7.6.1 \ /bin/sh -c " \ elasticsearch-certutil ca --out /app/elastic-stack-ca.p12 --pass '' && \ elasticsearch-certutil cert --name security-master --dns security-master --ca /app/elastic-stack-ca.p12 --pass '' --ca-pass '' --out /app/elastic-certificates.p12" && \ docker cp elastic-helm-charts-certs:/app/elastic-certificates.p12 ./ && \ docker rm -f elastic-helm-charts-certs && \ openssl pkcs12 -nodes -passin pass:'' -in elastic-certificates.p12 -out elastic-certificate.pem && \ kubectl create secret generic elastic-certificates --from-file=elastic-certificates.p12 && \ kubectl create secret generic elastic-certificate-pem --from-file=elastic-certificate.pem && \ kubectl create secret generic elastic-credentials --from-literal=password=$password --from-literal=username=elastic && \ rm -f elastic-certificates.p12 elastic-certificate.pem elastic-stack-ca.p12 encryptionkey=$(echo $(docker run --rm docker.elastic.co/elasticsearch/elasticsearch:7.6.1 /bin/sh -c "< /dev/urandom tr -dc _A-Z-a-z-0-9 | head -c50")) kubectl create secret generic kibana --from-literal=encryptionkey=$encryptionkey
In order to setup, from your dev machine run at the root of the repository:
helm repo add elastic https://helm.elastic.co helm repo update envsubst '${SMARTER_DATA_DOMAIN}' < cloud/elasticsearch/elasticsearch-values.yaml > cloud/elasticsearch/elasticsearch-custom.yaml helm install --wait --timeout 500 -f cloud/elasticsearch/elasticsearch-custom.yaml --name elasticsearch elastic/elasticsearch envsubst '${SMARTER_DATA_DOMAIN}' < cloud/kibana/kibana-values.yaml > cloud/kibana/kibana-custom.yaml helm install -f cloud/kibana/kibana-custom.yaml --name kibana elastic/kibana
InfluxDB is a fantastic database for efficiently storing and querying time-series data at scale. Hence it is perfect for storing edge node performance data in our system.
Install it by running the following from the root of the repository on your dev machine:
helm repo add influxdata https://helm.influxdata.com/ helm repo update helm install -f cloud/influxdb/influxdb-values.yaml --name influxdb influxdata/influxdb
Grafana is another visualization tool widely used among the APM community to view and analyze time-series data stored in the cloud. For our use case, we will be using Grafana to view the node and application metrics data stored in our InfluxDB instance installed in the previous step.
envsubst '${SMARTER_DATA_DOMAIN}' < cloud/grafana/grafana-values.yaml > cloud/grafana/grafana-custom.yaml helm install --name grafana -f cloud/grafana/grafana-custom.yaml stable/grafana
To track the health of our nodes running in both the cloud and at the edge, we install Netdata, which is a massively popular open-source monitoring agent. Even more popular than Netdata however is Cortex-A75. While Cortex-A75 serves a very similar purpose it employs a pull model for metrics from all its nodes, meaning the master process running in the cloud would try to initiate a request for new metrics data at the edge, where it may be blocked by firewalls/NATs. Netdata however, employs a push model for metrics, meaning the nodes produce performance data and attempt to send it to a master living in the cloud, making it a better choice for the edge.
The Netdata master process aggregates all the information it is receiving and forward it to our InfluxDB instance installed previously for long-term storage. The Netdata UI provided by the master will only display about an hour of real-time data from the nodes, so if you would like to keep historical performance data for later analysis, you must write it out to permanent storage. We can then leverage Grafana to view and analyze this historical data.
To install the cloud components of Netdata into your cluster, perform the following from your dev machine from the root of the repository:
envsubst '${SMARTER_DATA_DOMAIN}' < cloud/netdata/netdata-values.yaml > cloud/netdata/netdata-custom.yaml git clone https://github.com/netdata/helmchart.git ~/netdata helm install --name netdata -f cloud/netdata/netdata-custom.yaml ~/netdata/
Monitoring node health by viewing high-level performance characteristics of an application or the node itself is only one piece of the puzzle. Say we have identified that one of our applications is stalling on disk I/O unexpectedly on one of the edge nodes we manage. While having the source of the performance bottleneck is nice, we need to delve deeper into the application itself to find which code paths within the application are creating the disk stalls. Further, we may not even be able to remote into the node to dig around given the firewall/NAT configurations at the time. Jaeger provides us a minimally intrusive application tracing framework which conforms to the open tracing standard. Using Jaeger, as your application runs, collects trace data at the function level, called a span, indicating what arguments were passed to the function as well as execution time. From this granular span data, we can bundle correlated spans to construct execution traces, not only on a per-service basis, but also across service boundaries. Our cloud allows us to collect and store trace data for each of our nodes, and view and analyze them using web UIs.
To install the cloud components of Jaeger, run the following from the root of the repository on your dev machine:
helm repo add jaegertracing https://jaegertracing.github.io/helm-charts envsubst '${SMARTER_DATA_DOMAIN}' < cloud/jaeger-cloud/jaeger-values.yaml > cloud/jaeger-cloud/jaeger-values-custom.yaml helm install --name jaeger -f cloud/jaeger-cloud/jaeger-values-custom.yaml jaegertracing/jaeger
At this point in the tutorial, we have setup a bare-metal k3s cluster with data ingestion pipelines and web UIs which eagerly await interesting APM data to be produced from our edge nodes. To manage these nodes, we opt to use k3s once more. The beauty of k3s in many ways is that Arm devices are a first-class citizen. Many popular cloud-native open-source tools today focus on x86, creating headaches for developers who may like to use these tools on their own Arm clusters.
To install k3s, provision an x86 or Arm node to serve as your master. You do not have to even worry about installing Docker, as the k3s master runs as a single binary directly against the host. To install and run the k3s master as a systemd service, run:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server" sh -s - --disable-agent --write-kubeconfig-mode 664 --no-deploy servicelb --no-deploy traefik --no-flannel --no-deploy coredns
Note if you are running your master in a machine in the public cloud (ie EC2 instance), pass the flag --advertise-address <PUBLIC_IP> to the above command.
--advertise-address <PUBLIC_IP>
For your edge nodes, lets assume they are all running 64 bit Arm Linux. If you have a Raspberry Pi, you can try out running Ubuntu server, which has images available here. Ensure that your edge node has docker installed before continuing on to the next steps.
Before we install our k3s agent on our node, we will install a cni built for edge computing use cases, rather than relying on flannel which k3s will attempt to install by default. For information what makes this cni tailored for edge computing, refer to the previous blog post. To install the cni on your node run:
git clone https://gitlab.com/arm-research/smarter/smarter-cni.git cd smarter-cni sudo ./install.sh
With our cni installed, we now are ready to install the k3s agent. In the following command K3S_URL is the address where your master node can be reached and K3S_TOKEN can be obtained by running sudo cat /var/lib/rancher/k3s/server/node-token on your edge master node.
K3S_URL
K3S_TOKEN
sudo cat /var/lib/rancher/k3s/server/node-token
Now on each edge nodes which you wish to include in your cluster run (filling in the variables appropriately):
curl -sfL https://get.k3s.io | K3S_URL=https://<myserver>:6443 K3S_TOKEN=XXX sh -s - --docker --no-flannel
Fetch your kubeconfig for your k3s by copying the file /etc/rancher/k3s/k3s.yaml from your edge master machine to your dev machine. You may have to open the file and replace 127.0.0.1 in the server spec to the hostname of your server.
Now that we are using two clusters, we need a way to manage which cluster we target when we run our kubectl commands. Fortunately, Kubernetes provides a simple way of doing this. On your command line set the variable "KUBECONFIG" by doing the following:
export KUBECONFIG=<path to cloud kubeconfig>:<path to k3s kubeconfig>
You can open up each of these kubeconfigs respectively and modify the fields as per the following markup:
apiVersion: v1 clusters: - cluster: certificate-authority-data: <redacted> server: https://<your-master-ip>:2520 name: k3s-edge contexts: - context: cluster: k3s-edge user: default name: k3s-edge current-context: k3s-edge kind: Config preferences: {} users: - name: default user: password: <redacted> username: admin
You may do the same for your cloud cluster's kubeconfig, changing k3s-edge to cloud in all fields besides current-context. This field determines what cluster your kubectl commands will target. Now to switch between clusters, you can simply run kubectl config use-context <k3s-edge or cloud>.
k3s-edge
cloud
current-context
kubectl config use-context <k3s-edge or cloud>
To view your current configuration run kubectl config view, you should see the all the information from each of your two kubeconfigs displayed here, with current-context being set to the cluster you are currently targeting. For more information on multi-cluster configuration you can read here.
kubectl config view
Register your cloud elasticsearch credentials in your edge cluster by running:
kubectl create ns observability kubectl create secret generic elastic-credentials --namespace=observability --from-literal=password=<YOUR ELASTIC PASSWORD> --from-literal=username=elastic
Recall that you can obtain your elastic credentials by running the following command against your cloud k3s instance:
kubectl get secrets/elastic-credentials --template={{.data.password}} | base64 -d
Now that we have our k3s cluster up and running, let's deploy the edge side of our APM/observability infrastructure. At the moment, you have an instance of the Netdata master running in the cloud, awaiting information to be streamed up from the edge. In order to run a single copy the Netdata collector on each of our nodes, we use a Kubernetes DaemonSet.
The Netdata collector can be configured such that it acts as a headless collector of data, and forwards all metrics directly to the master living in our cloud via a TCP connection. For the edge use-case, this is exactly what we would like. In my own rough inspection, I found that the headless collector running on a Raspberry Pi 3B+ consumed about 2% cpu, 29MB RSS, and 700Kb/s of network, all while the device was running close to 20 containers which had their metrics collected at 1s intervals.
Ensure you have the variable SMARTER_DATA_DOMAIN set as before, and in addition export the following variables:
SMARTER_DATA_DOMAIN
export SMARTER_EDGE_DOMAIN=<YOUR_EDGE_MASTER_IP> export SMARTER_CLOUD_IP=<YOUR_CLOUD_MASTER_IP>
To deploy this app we apply the yaml to our edge cluster by doing the following (ensure your kubectl targets k3s):
envsubst < edge/netdata/netdata-configMap.yaml > edge/netdata/custom/netdata-configMap-custom.yaml envsubst < edge/netdata/netdata-daemonSet.yaml > edge/netdata/custom/netdata-daemonSet-custom.yaml kubectl apply -f edge/netdata/custom
This will create the Netdata collector DaemonSet as well as a ConfigMap which is used to store key value pairs that we can share to our entire cluster. There are a couple things that must be done here to configure our headless collectors appropriately when running with Kubernetes. If you inspect the folder edge/netdata/custom, you will find a few interesting features:
edge/netdata/custom
serviceAccountName
spec->template->spec
If you open our ConfigMap for Netdata at edge/netdata/custom/netdata-configMap-custom.yaml you will find the contents of the Netdata config file which will ultimately be used by the Netdata collector when running inside its container. If you wish to reconfigure Netdata, you simply modify this configmap and reapply the file, then remove and reapply the daemonset for the changes to be propagated through the cluster.
edge/netdata/custom/netdata-configMap-custom.yaml
Fluent Bit is a light-weight stream processing engine developed by Treasure Data (now part of Arm), who also authored the popular Fluentd log collector/processor/aggregator. Fluent Bit is the lighter-weight brother of Fluentd, making it a fantastic choice for running on resource constrained devices at the edge. As an example application, we use Fluent Bit to collect and stream all the logs in our cluster back to our Elasticsearch instance in the cloud, where we can then use the Kibana UI to filter and analyze the logs.
For the same reasons as the Netdata DaemonSet, we also create a ServiceAccount for Fluent Bit, such that it can query our API server and append Kubernetes pod metadata to the logs it collects from the docker daemon. When you view the logs in Kibana, you can filter them based on their Kubernetes metadata making them very easy to digest.
To begin running Fluent Bit on each node run (ensure your kubectl targets k3s):
envsubst < edge/fluent-bit/fluent-bit-ds.yaml > edge/fluent-bit/custom/fluent-bit-ds-custom.yaml kubectl apply -f edge/fluent-bit/custom
The Jaeger Agent is a headless collector that runs on each one of our edge nodes and collects information about the spans and traces produced by each one of the applications which are instrumented with OpenTracing clients. As the applications run, the OpenTracing client will bundle up span and trace data then send it to our agent via UDP, where it will then be forwarded to the Jaeger Collector in our cloud cluster. As of January 2020, Jaeger does not explicitly have support for Arm devices, so I have taken the time to port the Jaeger Agent to Arm64 and Arm. To see the Dockerfile recipes required to build the Jaeger Agent for Arm, you can reference this repository. You may use this repository to build the Jaeger Agent for yourself, or you may use images I have prebuilt for convenience. Before deploying the Jaeger Agent, export the env variable JAEGER_AGENT_IMAGE with the value registry.gitlab.com/arm-research/smarter/jaeger-agent-arm:latest to use my image, or the image tag for the image you built yourself.
JAEGER_AGENT_IMAGE
registry.gitlab.com/arm-research/smarter/jaeger-agent-arm:latest
To start the Jaeger Agent on each node with my image run (ensure your kubectl targets k3s):
export JAEGER_AGENT_IMAGE=registry.gitlab.com/arm-research/smarter/jaeger-agent-arm:latest envsubst < edge/jaeger/jaeger-agent-ds.yaml > edge/jaeger/custom/jaeger-agent-ds-custom.yaml envsubst < edge/jaeger/jaeger-agent-configMap.yaml > edge/jaeger/custom/jaeger-agent-configMap-custom.yaml kubectl apply -f edge/jaeger/custom
As a demonstrative example of the infrastructure we have setup in this tutorial, we will run a modified example application employing Jaeger tracing from a tutorial originally found here. I have forked the tutorial from github, made modifications, and built docker images for each of the three sample services. The source for the apps and their corresponding Dockerfiles in the forked repository can be found here.
Before deploying this sample application, export the env variables CLIENT_IMAGE, FORMATTER_IMAGE, PUBLISHER_IMAGE with the proper image names. You may build your own images by referencing the the forked repository, or to save time, I have gone ahead and prebuilt images for all three services, which have the names: registry.gitlab.com/arm-research/smarter/edge-jaeger-tutorial:client, registry.gitlab.com/arm-research/smarter/edge-jaeger-tutorial:formatter, and registry.gitlab.com/arm-research/smarter/edge-jaeger-tutorial:publisher respectively.
CLIENT_IMAGE
FORMATTER_IMAGE
PUBLISHER_IMAGE
registry.gitlab.com/arm-research/smarter/edge-jaeger-tutorial:client
registry.gitlab.com/arm-research/smarter/edge-jaeger-tutorial:formatter
registry.gitlab.com/arm-research/smarter/edge-jaeger-tutorial:publisher
To deploy the application with my images set, simply apply the example DaemonSets I have created by running (ensure your kubectl targets k3s):
export CLIENT_IMAGE=registry.gitlab.com/arm-research/smarter/edge-jaeger-tutorial:client export FORMATTER_IMAGE=registry.gitlab.com/arm-research/smarter/edge-jaeger-tutorial:formatter export PUBLISHER_IMAGE=registry.gitlab.com/arm-research/smarter/edge-jaeger-tutorial:publisher envsubst < edge/application/client/client-ds.yaml | kubectl apply -f - envsubst < edge/application/formatter/formatter-ds.yaml | kubectl apply -f - envsubst < edge/application/publisher/publisher-ds.yaml | kubectl apply -f - kubectl label node <your node name> formatter=yes publisher=yes client=yes
Tracing Architecture Overview
To trigger service events on your edge nodes, run the following command from a machine on the same network as your edge node:
curl "http://<your edge node ip>:8080/hello?helloTo=josh"
Running this command will make an http request of the node, which will ultimately respond by saying "Hello, josh!". On the backend the requests goes to a formatting micro-service to create the string, and a publisher service which logs the data which will be returned to the user to stdout.
You may run that command targeting any one of your edge nodes as many times as you like, with any name set as the value to the "helloTo" key.
If you navigate to http://jaeger-query-<CLOUD_MASTER_IP(dash separated)>.nip.io you will be able to navigate through the generated trace data in an intuitive UI. For example, if you cloud ip is 18.34.90.214, your url would be http://jaeger-query-18-34-90-214.nip.io. Here you will notice that our services are tagged with our node names prepended to the service name itself so you can distinguish spans based on the node.
http://jaeger-query-<CLOUD_MASTER_IP(dash separated)>.nip.io
http://jaeger-query-18-34-90-214.nip.io
This tutorial will give you more context on what valuable application information you can extract using opentracing and jaeger.
An example trace captured by the Jaeger tracing infrastructure
Logging Architecture Overview
We can also take a look at the logs being generated by each one of our services by navigating to http://kibana-<CLOUD_MASTER_IP(dash separated)>.nip.io. To login, your username is elastic, and the password is the value you queried from your cloud cluster at the beginning of the edge setup instructions. In Kibana, to configure your logging index, go to Management->Kibana->Index Patterns->Create Index Pattern, enter the index pattern logstash* then use select time from the next prompt and continue. In the discover tab you will then be able to filter and view the logs in any manner you'd like. As a simple example, if you fetch the pod name for our publisher service by running kubectl get pods | grep publisher, you can filter the logs for only those generated by this publisher pod. If you do so, you should be able to see the "Hello, josh!" message along with a timestamp.
http://kibana-<CLOUD_MASTER_IP(dash separated)>.nip.io
logstash*
time
kubectl get pods | grep publisher
Filtered logs for a pod displayed by Kibana+Elasticsearch
Performance Architecture Overview
Using the Netdata dashboard we can also view the real-time performance data at the node and pod level by navigating to http://netdata-<CLOUD_MASTER_IP(dash separated)>.nip.io. Here we can see real-time metrics for all of our nodes, and also see any alarms that have been generated given a set of rules which we can configure. If you navigate to the 'nodes' tab, you can see real-time status for all nodes in your cluster sorted by the health of the node, if you click on an unhealthy node, you can go into its dashboard and perform further inspection.
http://netdata-<CLOUD_MASTER_IP(dash separated)>.nip.io
An example of pod metrics displayed for the past hour in Netdata
Finally, for long-term metric storage, we can navigate to http://grafana-<CLOUD_MASTER_IP(dash separated)>.nip.io where we are able to configure an example dashboard and view historical performance data from each of our nodes. To setup an example dashboard perform the following steps:
http://grafana-<CLOUD_MASTER_IP(dash separated)>.nip.io
username: admin, password: admin
http://influxdb:8086
opentsdb
Save & Test
2701
This dashboard serves as a great entry-point for node metrics, but will require a few modifications to display information specific to your pods. It can be customized at a later time to fit your needs.
If you run the curl request repeatedly you will be able to see the spikes in activity in the Netdata or Grafana dashboards.
An example of node metrics queried for the last 7 days in Grafana
To summarize, we have brought up two independent clusters from scratch to manage the cloud and edge side of our sample system, and deployed data aggregators in the cloud along with data collectors on each node at the edge. This setup provides us the ability to keep track of the three pillars we usually consider when building APM/observability systems:
Each one of the collectors running on our edge nodes is designed to minimize introduced overhead, such that more compute resource can be spent extracting value from the quintillion bytes of data produced every day.
All the tools in this tutorial were designed to be used in the cloud, and don't map perfectly to the edge use-case. Moving forward there are a few areas where a system like this could be better tailored for edge computing:
If you have any questions or comments, please feel free to contact me.
Contact Josh Minor
This post is the second in a five part series. Read the other parts of the series using the links below: Part one: SMARTER: A smarter-cni for Kubernetes on the Edge
This post is the second in a five part series. Read the other parts of the series using the links below:
Part one: SMARTER: A smarter-cni for Kubernetes on the Edge