Accelerating at the Edge

April 6, 2024

5 minute read time.

Society is powered by data, and millions of terabytes are generated, processed, and consumed every day. Edge computing has been heralded as a new paradigm in this data-driven world. The technology aims to optimize performance and reduce costs by bringing computation closer to sources of data, such as Internet of Things (IoT) devices and local servers. Processing data at the periphery, or ‘edge’, of a network – rather than in the cloud – can improve both latency and security, in addition to reducing bandwidth usage. By using Arm technology, Télécom Paris have been tapping into these advances by developing a System-on-Chip (SoC) for accelerating machine learning (ML) at the Edge.

We spoke to Sumanta Chaudhuri, a Télécom Paris researcher specializing in SoC microelectronics and field programmable gate arrays (FPGAs), who described the project in more detail.

ApproxiNet is a chip we are designing. It’s mainly for accelerating deep neural networks, convolutional neural networks, and for ultra-low power consumption and very low latency. That's our target. It’s orientated towards Edge AI. Nowadays, most AI – like ChatGPT or Amazon Alexa, for example – is in the cloud. Most of the workload goes back to the cloud for processing, with only very limited tasks done on the device itself. By contrast, we want to do it all on the device, even the most complicated tasks. The main goal of ApproxiNet is a network for Edge AI.

We are convinced that things will go increasingly to the Edge - that is the future. Once you can reduce unnecessary computation, I think Edge computing will be on every device, even very small size ones, like drones. I think many optimizations can be done in a neural network.

Voice assistants, for example, have no reason to send data back to the cloud, they should be able to do everything on the spot. One of the biggest problems right now is privacy; whatever you say to voice-controlled virtual assistants goes back to the cloud. Even for drones, even for video cyber lens, everything goes back to the cloud. There is no fundamental reason why all this data should be sent back to the cloud. The cloud should only be for big simulations and much more complicated problems, not tasks like recognition and detection.

“We are using Arm IPs for the SoC, and more specifically, Corstone-101 – a package product from Arm. The main advantage we find is that we don't have to do a lot of verification; a lot of things are already verified in there. We’re just going to integrate our IP, our accelerator, into that.”

Our goal is to find innovations where we can reduce power consumption. There are two main sources of power consumption: computing – how many multiplications you are doing – and transfer, which is memory transfer back and forth from the memory. We are looking for innovation where we can reduce these two. We started around 2016 and in the beginning, we mainly worked on quantization, which is a technique for reducing computational and memory costs. Then we moved on to memory transfer, because that is one of the biggest culprits in power consumption.

What we propose with ApproxiNet is merging two layers in a neural network so that we don't have to write the intermediate data into memory. In this sector, there are lots of companies, and many startups. Our differentiating innovation is what we call this ‘lookahead convolution’, where we try to merge two or three layers in one go.

Normally with accelerators, you process one layer, write the data to memory, read it back again, then process the second layer. With our approach we merge layers and avoid this unnecessary memory transfer. We do the computation in a manner that allows us to process the data on the fly between two layers. We don't have to write it back to memory. This gives us an edge over others.

A well-known ecosystem

We are using Arm IPs for the SoC, and more specifically, Corstone-101 – a package product from Arm. The main advantage we find is that we don't have to do a lot of verification; a lot of things are already verified in there. We’re just going to integrate our IP, our accelerator, into that.

Working with Arm technology, I think it's much more comforting because Arm already has a well-known ecosystem. Arm SoCs are already verified. There are not a lot of people working on this project, and doing a System-on-Chip with three or four people is complicated if you have to verify everything.

Arm Academic Access has made it so much easier for academics to work with Arm IP. We are a very small group and we do not have the time to verify everything, so it’s comforting to know that we are relying on something that is solid.

“Working with Arm technology, I think it's much more comforting because Arm already has a well-known ecosystem. All the SoCs coming from Arm, they are already verified.”

ApproxiNet is designed to accelerate battery powered devices. Our standard use case is a drone which has to be intelligent, light, and battery powered. We mainly target object detection or tracking with a drone. I think that you cannot do that without a chip like ours. For tracking, you need a low latency, and you need low-power consumption because your drone will be flying for a long time. This is one of the key applications for which you need this kind of chip.

You cannot send this data back to the cloud and get it back because that will have hundreds of milliseconds of latency. You would have to send large frames of HD image data and then wait to receive the result. That will add a lot of latency and that's not very good for tracking. Applications like this will really benefit from our Edge acceleration.

Aiming high

There is a benchmark of all neural network tasks managed by a consortium called MLCommons. They give you a score based on the energy in microjoules, and latency in milliseconds. Our goal is to improve on the state of the art for each benchmark.

The other big goal for ApproxiNet is to industrialize. We call it prematuration in French. So, we do the prototype to prove that it’s working and then we'll go and ask for more funding. It’s not just a pure research project, we’re on a path from academic research into commercialization with our accelerator.

For universities and research institutes who are planning to go on and commercialize their research ideas, Arm Flexible Access for Startups gives access to a wide-ranging IP package and industry-leading technical support, with a $0 license fee.

Explore Flexible Access for Startups

Explore Research Enablement

0 comments
0 members are here

Research Articles

HOL4 users' workshop 2025

Hrutvik Kanabar

Tue 10th - Wed 11th June 2025. A workshop to bring together developers/users of the HOL4 interactive theorem prover.
- March 24, 2025
TinyML: Ubiquitous embedded intelligence

Becky Ellis

With Arm’s vast microprocessor ecosystem at its foundation, the world is entering a new era of Tiny ML. Professor Vijay Janapa Reddi walks us through this emerging field.
- November 28, 2024
To the edge and beyond

Becky Ellis

London South Bank University’s Electrical and Electronic Engineering department have been using Arm IP and teaching resources as core elements in their courses and student projects.
- November 5, 2024

Research Articles

Accelerating at the Edge

A well-known ecosystem

Aiming high

HOL4 users' workshop 2025

TinyML: Ubiquitous embedded intelligence

To the edge and beyond