Changing the batteries. A routine, commonplace task familiar to children and adults. Just about manageable if it is needed once a month on household devices. But what if your devices are deployed in their thousands in remote and inaccessible locations? Suddenly a one-month battery life does not look so good. To translate this problem into a research opportunity - enter DARPA’s N-ZERO program.
The N-ZERO program sponsors fundamental research into zero-power or low-power sensing devices. Sensors that can operate using just the energy of the sensed signal, like an infrared radiation detecting switch (Figure 1), or low-power wake up devices like a 12nW acoustic classifier that listens constantly before periodically triggering a more powerful processor.
Figure 1: Image from Rinaldi, M. Sensing Infrared Without Power. Poster presented at: DARPA ERI Summit 2018; 2018 Jul 23-25; San Francisco, CA.
However, solving the sensing challenge with very little power consumption is only part of the puzzle. Such a sensor also needs a processor with a power profile matched to the sensor. This is where Arm Research comes in. In September 2017, we accepted the challenge of designing a microcontroller that could fit with DARPA N-ZERO sensors, and the M0N0 project was born.
We started by looking at Commercial Off-The-Shelf (COTS) MCUs and quickly realized that a drastic reduction in both active and sleep power was required for this project to be a success (Figure 2).
Figure 2: mW active/mW sleep COTS MCUs vs. N-ZERO sensors
Our own previous research laid the groundwork for active power reduction, and even commercial Arm-based MCUs have reached impressive levels of power efficiency. But maintaining state-of-the-art active energy efficiency while also achieving the 10nW sleep power target would be the defining challenge for the M0N0 chip.
That was not all. A sensor deployed in a remote location may be listening for infrequent events, so it cannot afford to miss an event because the processor was too slow. It must assure a real-time response to events.
It was these specific considerations that drove the creation of the M0N0 system architecture (Figure 3).
Figure 3: M0N0 System Architecture
Drawing on our past research, M0N0 is a near/subthreshold 65nm Cortex-M MCU with custom cell libraries and data RAM to support low-voltage operation for the best possible energy efficiency. However, due to RAM leakage and power converter efficiency, the system would not be able to approach the 10nW sleep power target while retaining its Execution state (Figure 4).
Figure 4: 10nW sleep power cannot be met in retention leading to a new shutdown sleep mode
Our solution was to split the code memory and retention memory – code memory is composed of subthreshold mask ROM, while retention memory is composed of a custom low-leakage SRAM macro, drawing on published memory research achieving fA/bit leakage.
Embedding intelligence in a sensor node through more sophisticated compute capability helps overall battery lifetime by reducing the need for power-hungry off-chip communication. A complex workload was needed to show off the capabilities of the M0N0 chip and we settled on a 10-word KeyWord Spotting (KWS) application doing 1 classification per second. While the tiny Cortex-M0+ was the obvious starting point for M0N0, early trials showed the benefits of moving to a higher-performance CPU core. With SIMD extensions as well as separate instruction and data busses for better memory bandwidth, Cortex-M33 gave a substantial performance uplift at the same frequency.
Along with the hardware design, and following our previous low-power audio recognition research, several software and algorithm optimizations were also applied to the KWS task (Figure 5). This was to to bring it within the compute and memory footprint of a low-power MCU. ROM and RAM usage was minimized, and feature extraction was ported to fixed-point calculation. Data acquisition, feature extraction and NN flows were optimized with architecture search algorithms to reduce computation latency. This allowed the workload to run at only 2.5MHz, leading to a power consumption of just 50mW while producing 1 classification every second.
Figure 5: KWS network architecture
What about the challenge of deterministic response to events? While it is easy to achieve this using a fixed clock frequency, our prior work shows that this is inefficient for subthreshold systems. To address this, our prior work adapted clock frequency at a fixed voltage. This kept the system operating efficiently, but the clock frequency (and hence response time) could vary over 100× depending on temperature. So, for M0N0 we flipped this on its head, adapting voltage continuously while maintaining a fixed minimum clock frequency, guaranteeing the user a requested level of system performance. This continuous voltage variation capability also demanded a new Buck converter design, with a low-leakage power stage to hit the 10nW sleep power target.
And those are just the highlights. To realize a truly useable SoC, M0N0 also integrated:
The SoC is supported by a programming tool for the ROM, KWS demo boards, software libraries, documentation, and dev boards with swappable application “hats” (Figure 6). This makes it a hardware platform well suited to serve DARPA performers and the broader research community for ultra low-power sensor and embedded machine learning applications.
Figure 6: M0N0 development board
The M0N0 SoC was presented at Session 27.2 (IoT and Security) at ISSCC 2020 in San Francisco, accompanied by a KWS demo (Fig 7) at the same event. Please read the paper for more details, and we welcome your questions and feedback in the following comments. Please do feel free to contact us too by email, and we would be happy to answer any questions.
Read the full paper Contact Pranay Prabhat
Figure 7: M0N0 at the ISSCC demo session
The views, opinions, and findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.