The fifth wave of computing is already unfolding, driven by AI, IoT and 5G. What will it take to ensure this revolution achieves its potential? Professor Massimo Alioto leads the Green IC group and is Director of the Integrated Circuits and Embedded Systems area at the National University of Singapore. Here, he shares the challenges, and the work that his group have done to address them.
Since I was a young man, I have dreamed of having electronics, intelligence, sensing abilities and communication all embedded in our objects. They should enhance our lives without requiring thought or attention from us. Our objects can be a lot smarter, though even devices today represent tremendous progress.
At the same time, it is important that this vision does not come at the cost of the environment. This has led me to develop a keen interest in highly energy efficient systems and ultra-low power systems.
At Green-IC we research the foundational technologies that will make this vision a reality. Our overall research program is composed of 4 main quadrants and a couple of adjacent areas.
The first ingredient to achieve this is wide energy and performance scalability, where we expand by orders of magnitude the range over which tradeoffs are possible. This is necessary to achieve extremely low power consumption most of the time, but significant performance in response to external events that trigger intensive on-chip data analytics. The scaling must be responsive, so low latency is important.
“Having access to the Arm IP has been great for us in terms of research. We have used it to develop systems with extremely wide energy performance ranges”
The voltage scaling approaches currently in use are running out of steam. The shrinkage of the supply voltage with newer process technologies reduces the gains that are possible.
An approach we have taken to address this is reconfigurable microarchitectures. When you design a microarchitecture for peak performance or nominal voltage, it will not be particularly energy efficient when you scale down the voltage. The balance between dynamic energy, leakage energy and clocking energy will be far from optimal. With reconfigurable microarchitectures, we can introduce an uncommonly high level of flexibility at the microarchitectural level while still using classical commercial EDA tools and existing design flows.
Having access to the Arm IP has been great for us in terms of research. We have used it to develop systems with extremely wide energy performance ranges.
Another approach is approximate computing, where we reduce the design margin in a controlled manner. You then detect and fix in hardware any faults that occur, instead of heedlessly going for the worst-case design margin. Which in any recent technology node is a very substantial fraction of the timing requirements. We have expanded this concept into memories and networks on chip.
AI models are very robust against noise and occasional faults. For such systems, it is not always necessary to maintain the traditional design margins – which are very expensive in terms of performance and energy efficiency.
Instead, allowing, detecting, and adapting to runtime faults led us to develop the first energy- and quality- scalable network-on-chip that is able to achieve best-in-class energy efficiency of about 6 femtojoules per bit per millimeter. That is state of the art and is comparable to other work in 16 nanometer finfet. But we achieved it on a 28nm process which is 2 generations older.
We apply these techniques to complete systems, not just processors. We look at sensor interfaces, system power management, and extremely low power radio frequency communications – mostly backscattered radios.
Once we have solved extreme energy scalability, the next focus is to eliminate batteries. As we scale up to trillions of devices, it will be impossible to embed a trillion batteries in them. Replacing a $20 battery in a trillion devices every 5 years would cost $4 trillion. That is equivalent to the fourth-largest economy globally by GDP. It would also require too many raw materials, and the disposal would have an enormous impact on the environment.
The key challenge in eliminating batteries is to design systems with extremely low peak power consumption. You have no energy storage so you cannot average over time. You must ensure that your peak power is within the envelope that the harvester provides. We have had to scale all the subsystems of a typical sensor node, from sensor interface all the way down to communications.
Even the memory must scale in power. We achieve this by creating a hierarchy of memories with different power needs. From latch memory to small SRAMs and at the upper end of the scale, perhaps even a non-volatile storage.
“It is actually part of our mission to show the world that “purely harvested” and “always on” are not necessarily mutually exclusive”
A sensor node typically needs to detect events, timestamp them, and store the data in memory. When you are power-constrained, you store this data in a volatile manner until you have enough power to move it to the non-volatile storage higher up the memory hierarchy.
We work with harvesters that are reasonably stable - light, thermal and RF (radio frequency). Even so, the devices that we have produced must scale 6 orders of magnitude at the full system level. From milliwatts down to nanowatts, which is a huge challenge. That allows them to continue to operate in unfavorable conditions.
The nanowatt scale corresponds to the power provided by a 4mm x 4mm commercial photovoltaic cell under the light level of 1 lux, or approximately moonlight. By aggressively reducing the peak power to fall within this budget, we avoid the need for intermittent computing. The device can be always on, whether in sunlight, indoor light, or moonlight.
It is actually part of our mission to show the world that “purely harvested” and “always on” are not necessarily mutually exclusive. We have recently demonstrated systems whose peak power stays within the nanowatt range. We had to trade off performance to achieve that, but in IoT sensor nodes that is typically not a problem.
For wireless communication, we have developed radios that can communicate at low microwatt power levels. Our most recent Wi-Fi transmitter consumes less than 2.5 microwatts, and can be easily integrated in existing wireless networks, making it truly adoptable.
“The Arm IP has been instrumental for our research because we could focus on the most unconventional aspects of the chip design whilst having very solid support from software to hardware”
That is a fundamental principle that guides our research. To make a real impact our research must be easily adoptable, without too much disruption to either the manufacturing supply chain, or the existing infrastructure. We do not just want to write an interesting paper, we want to change the world.
The Arm IP has been instrumental for our research. We could focus on the most unconventional aspects of the chip design whilst having very solid support from software to hardware. This means that we have used our effort at best, because we have been able to rely on a very solid design framework. We use that as a basis to go beyond and introduce further innovation.
Real systems need to be connected and intelligent . Connectivity is not feasible if every single edge system needs to send raw data to the cloud. The network bandwidth would be unsustainable. There are issues that are related to privacy and the confidentiality of data. And power consumption would be too high if you need to transmit all the time.
Our approach is to embed high levels of intelligence on chip, and this requires extremely energy efficient accelerators for AI. We have demonstrated an AI accelerator at 30 tera-ops per Watt in 40 nanometer CMOS. Its energy efficiency and area are equivalent to the best-in-class research prototype that has been demonstrated so far in 5 nanometer, which is 6 CMOS generations ahead.
Of course, AI accelerators are not typically used all the time on edge devices. They are triggered infrequently, when an event of interest has been detected. The thing that happens frequently, is data coming from sensors. It is stored in a buffer, then typically must be preconditioned with some level of digital filtering.
That means reading the data from the memory, processing it outside the memory, and then storing it back into the memory. This is inefficient since it does not preserve data locality.
We have demonstrated architectures of in-memory computing that can perform a wide range of tasks within the same memory macro.
These tasks include typical DSP tasks that might be applied to filtering sensor data. And then, if an event has been detected, you can also do convolutional neural network acceleration in-memory. This means that the data really is stationary. You are not getting it in and out of memory so it is really end-to-end data locality. This enables high throughput and extreme energy efficiency.
Adding intelligence to devices reduces their need for wireless communication. But some communication is still necessary, and this poses a problem - you have gigantic networks, that are prone to attacks. So now you need to assure that even low cost, low power and area-constrained systems are still able to protect themselves from cyber-attacks.
Logical attacks exploit architectural vulnerabilities, and to defend against them you need a fully-fledged root of trust on-chip. This means that every single system-on-chip must be equipped with Physically Unclonable Functions (PUF) and True Random Number Generator (TRNG) both of cryptographic grade.
This is easily achievable with existing approaches, but requires extra area, design and integration costs. In low-end devices, this is just too expensive. A solution is to unify them with something that is already available on chip. You could embed these primitives into existing memory macros, or you could reuse existing logic.
“There is a wide range of side-channel attacks that are considered by the public to be esoteric, sophisticated attacks, but unfortunately they are actually very easy and inexpensive”
We have demonstrated reuse of the cryptographic engine itself to generate cryptographic keys. We inject metastability in the clock storage elements by replacing flip-flops with pulse latches. This leads to output that is heavily sensitive to random noise, which is exactly what you want in a True Random Number Generator.
Cryptographic keys generated in this way have extremely good entropy, and you can feed them through the same circuit to then operate as a conventional cryptographic engine. This is a tremendous reuse of resources.
Once you have made a system robust at the logical level, you next need to consider invasive, physical attacks. There is a wide range of side-channel attacks that are considered by the public to be esoteric, sophisticated attacks. Unfortunately, they are actually very easy and inexpensive.
You can attack a device by analyzing its power consumption using a $200 commercial board that even includes the software to perform such attacks. You use the board to replay inputs to the chip under test, repeating its behavior until the power consumption reveals the cryptography.
To detect these attacks, you need to detect the presence of probing devices in the supply of your chip. At VLSI symposium, we demonstrated the detection of attacks with a 6-sigma level of confidence. This technique requires silicon area, but can fit in the typical area of 2 supply pads - supply and ground. This means that underneath these pads, you can place the whole system for detecting the presence of probing devices without taking any extra area.
We embedded machine learning on-chip that is able to mimic the attack. It can evaluate on-chip the correlation between power consumption and the secret key, that the attacker would estimate off-chip. That allows us to compensate the power that is information-sensitive in such a way that although the power is consumed internally, the attacker will not see anything externally. This approach is robust at least to the scale of 1 billion power traces, which is many months of run time.
But the really great thing is that it allows us to address future attacks that have not been created yet. Once a new vulnerability has been exposed to the public, we can retrain the ML model to detect it in addition to all the existing vulnerabilities. Then we can wirelessly deploy that model to trillions of existing devices. This work was recognized with an Outstanding Paper award at the 2023 ISSCC conference.
Another class of attack that we have looked at is Laser Voltage Probing (LVP). A laser beam is used to hit the back side of the chip, resulting in a backscattered beam that is modulated by the carrier density. This allows us to infer the actual voltage at that location.
The signal is typically very noisy, so you need to average across tens of thousands of phases. However, in a few hours you would be able to detect virtually any secret information on-chip with a very high success rate. The equipment needed to do this can be rented fairly cheaply from many labs around the world.
This is a huge problem for blockchain e-wallets, whose value is not capped the way credit cards are. It is plausible that in the future, property ownership could be encoded in e-wallets, so the security stakes are very high.
“To change the world and do something meaningful, it's all about people and the culture and the values that are shared by groups of people”
It is important that LVP attack detection should have full coverage, both in space and time. It must be always on, so it needs to be extremely power efficient. Current approaches rely on photo sensors, which must be spread across the chip with sufficient density that you are guaranteed to detect a narrow laser beam.
We have demonstrated an approach that leverages the CMOS manufacturing process to embed photo-sensitive p-n junctions that can detect light both below and above the silicon band gap wavelength. This is all within a fully automated design flow. The main challenge was finding a suitably sensitive p-n junction, and supporting recent technologies that require restricted design rules.
Designing analog systems is very intensive, more so than the design methodologies for digital systems. We have introduced methodologies that allow architectures of analog subsystems to be implemented through digital standard cells. We have demonstrated this for a wide range of blocks including analog to digital convertors, digital to analogue converters, amplifiers, and many others.
This approach allows us to design an analog system, in a matter of hours. You write Verilog, and the digital design flows automatically generate a standard cell based circuit. It can also be integrated within the same logic that actually processes the signals that are sensed and digitized by the sensor interfaces.
This means that we preserve the design efficiency of digital subsystem design. The challenge is to come up with the architectures that allow us to describe and implement fully analog systems while using digital building blocks.
To change the world and do something meaningful, it is all about people and the culture and the values that are shared by groups of people. I really believe that a few capable people in a room can make huge impact if they are aligned and motivated. In the end, infrastructure and resources can be found – they are not the limiting factor.
The real limit is the willingness to look into problems without the fear of being wrong. Openly communicating and challenging ideas in a constructive way rather than a personal one. Setting egos aside and exploring together the limits of what is possible. Once you have groups of this type, most of the time you will find a solution. Even if it is not the one that you originally had in mind.
It is very clear that we cannot grow our ecosystem of connected devices to trillions, unless we solve the problem of how to make devices that are always on yet still purely harvested. We have shown that this is possible, by developing techniques that are compliant with existing manufacturing capabilities and design methodologies commonly used in industry. It is important that these techniques are picked up by companies who can turn this exciting vision of the future into a reality.
By taking on the mundane tasks that we must perform each day, these devices will free us up to focus on tasks that are more challenging but more rewarding. That will allow us to really leverage the cognitive abilities that are unique to humankind.
Once we have these devices everywhere, they will change our lives with seamless natural mode interfaces. Our devices will look at us, listen to us, sense our postures, and know where we are from localization. They will anticipate our needs, providing us with information for tasks that we will likely need to take on in the near future. We will move from on-demand information to an era of “beyond demand” information.
Massimo Alioto leads the Green IC group and is Director of the Integrated Circuits and Embedded Systems area at the National University of Singapore
Arm makes a wide range of IP and tools available at no charge for academic research use.
Explore Research