Embedded AI for Healthcare: How We Built COVID-Net for Embedded Devices

July 16, 2020

12 minute read time.

***ALL Content written by Sheldon Fernandez (CEO at DarwinAI) and Alexander Wong (Chief Scientist at DarwinAI)***

By now, deep learning applications have touched almost every industry. With healthcare systems becoming increasingly overburdened in the past few months, everyone in the healthcare delivery chain (medical equipment manufacturers, hospitals, clinicians, and patients) are looking for ways to deliver better health outcomes faster, while using fewer resources.

The benefits of embedded deep learning, which enables powerful and automated prediction capabilities across vast datasets, align well with the objectives of healthcare systems today. With deep learning, you can use consumer devices for earlier detection of diseases. For example, sensor data in hospitals can monitor and optimize resources, and CT and MRI images from radiology equipment can be used for diagnosing at the “edge” . However, the process of developing reliable, explainable deep learning algorithms remains complex. Moreover, the infrastructure to enable deep learning is costly, given the compute-heavy nature of deep learning applications.

We recently partnered with Arm to help industries, like healthcare, quickly bring more efficient and cost effective deep learning solutions to embedded devices. Like Arm, our hope is to enable healthcare technology for faster explainable insights, improved outcomes and at a lower cost.

Bringing Embedded AI to Healthcare: The COVID-Net Initiative

On March 22, we announced the open-source availability of COVID-Net—a deep neural network (DNN) which examines chest X-rays (CXR) to detect COVID-19 infections.

Our effort leveraged explainable artificial intelligence (XAI) technology to accelerate the development of COVID-Net—completely from scratch—in under a week.

COVID-Net is intended as a complementary tool to help medical professionals rapidly screen for COVID-19 infections. One of the biggest bottlenecks in triage and diagnosis is the time it takes for experts to interpret radiography images, which can be 20 minutes or more. Consequently, computer-aided diagnostic systems have the potential to help save lives and more efficiently direct scarce medical resources.

The response to our announcement was incredible (and continues!), and we cannot thank everyone enough for their contributions and interest.

Look at how COVID-Net performs quickly and reliably on Arm processors—significantly reducing the time it takes to triage and diagnose a patient:

Configurations	Inference Latency (seconds)
TensorFlow Lite v1.14	5400 ~ 5600
Arm NN TF Lite parser with Neon-CPU backend	2700 ~ 3000
Arm NN TF Lite parser with Neon-CPU & Mali-GPU backends	2600 ~ 2800

Arm NN is the inference engine for Arm CPUs, GPUs, and NPUs. It executes ML models on-device to make predictions based on input data. Arm NN enables efficient translation of existing neural network frameworks, such as TensorFlow Lite, TensorFlow, ONNX, and Caffe. It allows these frameworks to run efficiently and without modification across Arm CPUs, Mali GPUs, and Ethos-N NPUs.

Rock Pi 4B has a powerful 64bit hexa-core Arm based processor. It features dual Cortex-A72 (frequency 1.8Ghz) with quad Cortex-A53 (frequency 1.4Ghz) CPUs and Mali T860MP4 GPU. The results were collected with the Debian Stretch (9.9) Desktop version.

Beyond the expected inquiries about how to contribute to the project and use it responsibly, one question was asked more than any other: how did we build such a high-performing and purpose-specific deep neural network so quickly?

Pandemics do not wait

Given the urgency of the COVID-19 pandemic, we needed to build COVID-Net at an accelerated pace. Bear in mind that building custom deep learning solutions for specific tasks and embedded requirements routinely takes many months, even at large enterprises.

In this case, we set a goal of “less than 7 days” so we could get COVID-Net out into the world for the global community to build on while at the same time acting as a reference point about what can be achieved with breakthrough XAI technology.

Here is how we did it

In pursuit of our objectives, we employed a human-machine collaborative design strategy, in which we combined human-driven principled network design prototyping with machine-driven design exploration over four steps:

Data collection
Principled network design prototyping
Machine-driven design exploration
Machine-guided design audit

Rather than treating AI as simply a tool to be leveraged, this strategy treats AI as a collaborator which learns from the developers’ needs and explains how to design multiple solutions with different trade-offs. It also explains how these solutions are making their decisions, thereby enabling a rapid and iterative approach to model building.

1. Data collection

To get started, we constructed a dataset, dubbed COVIDx, using a combination of existing publicly available sources and sources our collaborators have made publicly available. COVIDx consisted of 16,756 CXR images across 13,645 patient cases and continues to grow each day as new data arrives.

Please note that the dataset generation scripts for constructing the COVIDx dataset are available publicly for open access.

A CXR image in the COVID-Net dataset.

The COVIDx dataset consists of thousands of CXR images from public sources.

While COVIDx has grown significantly since inception, it nevertheless exhibits data class imbalance owing to the rarity of positive COVID-19 cases relative to other respiratory issues.

This problem is not uncommon, so model designers must remain vigilant and take steps to reduce impact of data class biases.

2. Principled network design prototyping

The first stage of the human-machine collaborative design strategy is principled network design prototyping, in which we constructed a prototype based on human-driven design principles and best practices.

Essentially, this prototype provides the initial scaffolding while leaving the final microarchitecture and microarchitecture design to machine-driven design exploration.

As our starting point, we leveraged residual architecture design principles, as they enable reliable neural network architectures which are easy to train to high performance and which enable deeper architectures to be built successfully.

To help clinicians to better decide not only who should be prioritized for reverse transcriptase-polymerase chain reaction (RT-PCR) testing, but also which treatment strategy to employ while RT-PCR testing results are pending (which can currently take days), we designed our prototype to make one of three predictions (three-class softmax output):

No infection (normal)
Non-COVID-19 infection (for example, non-COVID19 viral, bacterial, and so on.)
COVID-19 viral infection

3. Machine-driven design exploration with Generative Synthesis

The second stage of the human-machine collaborative design strategy is machine-driven design exploration using DarwinAI’s GenSynth platform.

Rather than using brute force or manual effort to explore ad hoc combinations, we instead used GenSynth to leverage our intrinsic understanding of the domain requirements and employ a systematic, intelligent approach to design exploration.

More specifically, GenSynth used the initial network design prototype, the data, and our human-defined design requirements to guide a design exploration which learns and identifies the optimal macroarchitectures and microarchitectures with which to construct the final tailor-made DNN architecture for any device.

Such machine-driven design exploration enables much greater flexibility than is possible through manual human-driven architecture design, while still ensuring that the resulting DNN satisfies domain-specific operational requirements.

DarwinAI’s GenSynth platform

DarwinAI’s GenSynth platform makes it easy for designers to explore and generate custom AI models tailored for data and task at hand.

For COVID-Net, our operational parameters included requiring greater than 80 percent COVID-19 sensitivity and positive predictive value (that is, probability that patients with a positive screening result truly have COVID-19) and fewer than 2.5 G multiply-add operations. We chose these parameters to strike an appropriate balance between accuracy, memory footprint and inference speed. In particular, one of our key considerations was to create a DNN which can run on different hardware, including edge devices—perhaps even the actual imaging device itself. Healthcare companies are increasingly looking at Arm’s solutions for designing imaging devices purpose-built for rapid diagnosis at the edge.

Using the information, we provided, GenSynth provides a number of different ‘ready-to-go’ model models which meet our design requirements, subject to different characteristics and trade-offs.

From there, we were able to analyze and develop a more detailed understanding of the design choices made by GenSynth to guide us as we explored and refined our model. GenSynth helped us not only to design new models, but also to identify key performance bottlenecks. This gave us much greater transparency into the overall make-up and performance of the network itself. This human-machine collaboration enabled hands-free creation of unique, tailored designs with different and—importantly—known trade-offs.

The GenSynth platform

The GenSynth platform enabled informed design choices within generated models; this image shows different layers within different generated models, with red highlighting performance bottlenecks with edge hardware in mind.

4a. Machine-guided design audit: functional validation via Explainability

While it is important to have a high-performance network, the output alone is not sufficient to tell you where the model is performing well and where it suffers—or, crucially, if it is performing well for the right reasons.

A lack of willingness and ability to audit designs is a major contributor to “black box” models, which are coming under increased scrutiny as AI becomes more pervasive and plays a much larger role in society and industry.

However—and perhaps counterintuitively to readers who have their own auditing horror stories to tell—investing the time to audit your model actually accelerates development: knowing where a network is doing the right thing and where there are gaps greatly increases your ability to develop, preventing pain and ‘debugging’ much later in the process.

Of course, it is no mystery why design audits are frequently omitted: the alternatives to XAI-based audits are cumbersome and time-consuming, often involving scripts, interpretations, and lots of manual efforts; plus, they are not very effective (especially for unusual and non-intuitive cases, like the example we will get to shortly).

Fortunately, the insights gained through explainability inquisition can not only be used to generate better networks, but they can also be used to show why networks come to different conclusions.

Identify error scenarios
Understand the reliability of our model
Gain valuable insights into how to improve our model

All of these lessons contributed to building trust with users and ensuring long-term efficacy.

The initial confusion matrix for COVID-Net

The initial confusion matrix for COVID-Net on the COVIDx test dataset.

GenSynth allowed us to automatically group different error scenarios to get a very quick high-level picture of how the network was performing. It pinpointed exact biases, gaps and issues at a glance, and helped to understand the critical factors behind model decisions (as shown in the following images).

Example CXR images of COVID-19 cases.

Example CXR images of COVID-19 cases from several different patients and their associated critical factors (highlighted in red) as identified by GenSynth.

In addition to mode auditing for more responsible and transparent design, the ability to interpret and gain insights into how the proposed COVID-Net detects COVID-19 infections is also important for:

Increasing trust: By identifying the critical factors in the decision-making process, the predictions made by COVID-Net are made more transparent and trustworthy to clinicians, which can assist them in making faster yet more accurate assessments.
Revealing new insights: Highlighting critical factors can help clinicians unearth new insights into the key visual indicators behind COVID-19 viral infection, which they can appropriate into their own diagnostic processes to further improve screening accuracy

Verifying that a model is making the right decisions for the right reasons is an extremely important part of designing effective models which can function in hospitals.

XAI technology provides unparalleled insights in this regard, letting designers understand the critical factors which lead to a model making particular conclusions, and allowed them to identify and remove false cues from the model.

In this post, we have only examined COVID-Net’s X-ray analysis, but we are also working on COVID-Net-CT to make detections based on CT scans. During auditing one of our earlier COVID-Net-CT model designs, we encountered an issue which likely would have been overlooked—or, at the least, would have been very difficult to identify.

The following figure shows 15 CT scans in which the model correctly detected a COVID-19 infection.

However, by highlighting the critical factors which led to the detections, GenSynth revealed that the detections were not even based on anything in the patients’ lungs. Instead, the critical factor was the appearance of the bed of the CT scanner.

Identifying this false cue was invaluable to letting us improve the model; in this case, the discovery caused us to revisit how we created and processed our data and how we trained our models.

CT scans for COVID-19 cases.

In these examples, GenSynth revealed that the critical decision factor (lighter gray areas) responsible for correct COVID-19 diagnoses based upon CT scans was the appearance of the bed of the CT scanner

Such ‘right decision, wrong reason’ scenarios as this one are very difficult to track and identify without the use of such an explainability-driven auditing strategy, and thus highlight the value of explainability in improving the reliability of deep neural networks for clinical applications.

4b. Machine-guided design audit: explaining and understanding the COVID-Net architecture

By combining human domain knowledge with GenSynth’s explainability, our team produced a unique, diverse model—within our 7-day target.

COVID-Net exhibits an efficient microarchitecture design largely composed largely of 1x1 convolutional layers and depth-wise convolution layers. The heavy use of a projection-expansion-projection (PEPX) design pattern allows for very good overall performance efficiency, while still maintaining strong COVID-19 sensitivity and PPV.

The model also includes selective long-range connectivity, which is unusual because residual networks usually exhibit short-range connectivity. This result is owing to a fundamental trade-off between performance and memory footprint (an important consideration when looking at Arm’s Cortex-M, Cortex-R, or Cortex-A processors). In COVID-Net’s case, being very selective about employing long-range connectivity only where necessary minimizes the overall footprint in accordance with our human-directed operational parameters.

COVID-Net detections based on X-ray images.

COVID-Net, in all its glory—the design employs a diverse collection of architectural traits that result in a high-performance model purpose-built for making accurate COVID-19 detections based on chest X-ray images.

Embedded is paramount

In our last update, we released three new models using the previously mentioned design approach: COVIDNet-CXR4-A, COVIDNet-CXR4-B, and COVIDNet-CXR4-C. Each model was built using our GenSynth platform with varying performance and efficiency tradeoffs and are smaller, higher-resolution, and higher-performing than our previously released COVID-Net models. COVIDNet-CXR4-C in particular, represents the most compact model that works well in an embedded context.

In the context of the pandemic, an embedded version of COVID-Net—model that works on small devices with limited computing capabilities—is an invaluable resource. One poignant example is rural and marginalized communities where connectivity to a central server is unreliable.

In such cases, the ability to diagnose the virus by a completely disconnected device is paramount. Luckily, Arm also provides solutions to enable this intelligence and connectivity for different edge environments:

Cortex-M—Powering the most energy-efficient embedded devices
Cortex-R—Reliable mission-critical performance
Cortex-A—Supreme performance at optimal power
Ethos—Highest performance for machine learning inference

COVID-19 testing

COVID-19 testing in rural Louisiana

Photo: Louisiana National Guard

Learn More about Darwin.ai

DarwinAI and Arm's Partnership

0 comments
0 members are here

AI blog

Coaching AI coding agents: A guide for senior engineers

Alex Spinelli

Learn how senior engineers can coach AI coding agents to design, debug, and deliver high-quality code in immersive dev environments.
- June 30, 2025
Optimize Llama.cpp with Arm I8MM instruction

Yibo Cai

Boosted Llama.cpp Q6\_K & Q4\_K inference using Arm's I8MM (smmla) for faster, efficient int8 matrix multiplies on Neoverse-N2 CPUs.
- June 27, 2025
Build AI responsibly with the Yellow Teaming methodology and LLM assistant

Zach Lasiuk

Yellow Teaming helps developers build responsible AI by aligning products with long-term value, not just short-term success.
- June 6, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog