Accelerating ML inference on X-Ray detection at edge using Raspberry Pi with PyArmNN

December 9, 2020

9 minute read time.

The COVID-19 pandemic continues to have a devastating effect on the health and well-being of the global population. While researchers around the world are working on a solution, a critical step in the fight against COVID-19 is identified as effective screening of infected patients as early as possible. One of the effective and easy ways to do it by doing X-RAY classifications using “AI” between an infected person vs a healthy person. The proposed model solution is developed to provide accurate diagnostics for binary classification (COVID-19 vs. healthy patients) but in the future can be extended in multi-class classification (COVID19 vs. no-findings vs other diseases such as pneumonia ) and so on. Please note that this blog doesn't claim to be a solution for COVID-19 or a medical solution for the COVID-19 detection. This is just a demonstration on how AI can be used to solve such problems in future and how we can use Arm powered embedded devices to implement such AI solutions.

Currently, this idea has taken a big global initiative and teams across the globe have come up with an open-source database called COVIDx. This is an open-access benchmark dataset that is being generated comprising of 13,975 CXR images across 13,870 patient cases, with the largest number of publicly available COVID-19 positive.

This blog is trying to show on developing a simple X-Ray classification model using the pre-trained VGG-16 model and then deploying it on Arm Powered devices such as Hikey-960 or Raspberry Pi 4. Also, with some tweaks, it can be deployed on the Arm AI NPUs (Neural Processing Units) such as Hactar (Ethos-U55). In the future, Arm powered medical devices will be vital in detecting similar respiratory infectious diseases by using AI at the edge in medical devices and help us in achieving a robust healthcare system.

What is Arm NN and PyArmNN?

Arm NN is an inference engine for CPUs, GPUs, and NPUs. It executes ML models on-device to make predictions based on input data. Arm NN enables efficient translation of existing neural network frameworks, such as TensorFlow Lite, TensorFlow, ONNX, and Caffe. It allows them to run efficiently and without modification across Arm Cortex-A CPUs, Arm Mali GPUs, and Arm Ethos NPUs.

PyArmNN is a newly developed Python extension for Arm NN SDK (Software Development Kit). PyArmNN is available in Arm NN under armnn/python/pyarmnn folder. Instructions on how to install PyArmNN are also available on the README page.

What do we need?

A Raspberry Pi. I am testing with Raspberry Pi 4 with Raspbian 10 OS. The Pi device is powered by an Arm Cortex-A72 processor, which can harness the power of Arm NN SDK for accelerated ML performance.
Before you proceed with the project setup, you need to check out and build Arm NN for your Raspberry Pi. Instructions are here.
PyArmNN package

Training and validation dataset and setup:

Using COVIDx database, I have trained a custom model based on VGG-16 to do an X-RAY classification with (95%+) accuracy in detecting COVID-19 symptoms patients vs normal patients.

After that, I have deployed the COVID model using Raspberry Pi device based Arm Cortex-A72 processor Arm (Cortex-A CPU).

Model Used for this X-Ray Classification is VGG16:

VGG16: A diagram about how this model works.

VGG Neural Networks. While previous derivatives of AlexNet focused on smaller window sizes and strides in the first convolutional layer, VGG addresses another very important aspect of CNNs: depth. Let’s go over the architecture of VGG:

VGG takes in a 224x224 pixel RGB image. For the ImageNet competition, the authors cropped out the center 224x224 patch in each image to keep the input image size consistent.
Convolutional layers - The convolutional layers in VGG use a very small receptive field (3x3, the smallest possible size that still captures left and right and up and down). There are also 1x1 convolution filters which act as a linear transformation of the input, which is followed by a ReLU unit. The convolution stride is fixed to 1 pixel so that the spatial resolution is preserved after convolution.
Fully connected layers - VGG has three fully connected layers: the first two have 4096 channels each and the third has 1000 channels, 1 for each class.
Hidden layers - All of VGG’s hidden layers use ReLU (a huge innovation from AlexNet that cut training time). VGG does not generally use Local Response Normalization (LRN), as LRN increases memory consumption and training time with no particular increase inaccuracy.

Model doing X-Ray classification:

Stage 1: Importing libraries