Small is only a matter of perception. These days, if you zoom in far enough to a chip the size of a fingernail, you could encounter 30 miles of interconnect ‘wires’ in stacked levels - a vast nanoscale web connecting transistors and other components and transporting electrons. As technology drives transistors ever smaller, wires must get thinner. This can negatively impact their qualities and slow system performance.
Technologists continue to innovate to overcome these issues, and a team at Georgia Tech’s Nanoelectronics Research Lab has recently been using Arm IP to assess the potential impact of their work on microprocessors. They found some striking results. Here, lab director Dr Azad Naeemi and PhD student Da Eun Shim tell us more.
As devices have continued to scale down, it has led to very narrow wire dimensions within interconnects – the ‘pipeline’ that links transistors and other components and transports electrons. This causes interconnect resistance in advanced nodes to increase, with a corresponding increase in signal delay. This results in major bottlenecks when you are trying to find further improvements in the operating frequency – and speed – of a chip.
There is a great deal of research going on at the technology level here. People are looking for new materials and wire geometry to improve performance. Our work aims to evaluate the benefits of this research on realistic systems – identifying the most promising options by looking at the impact of these nanoscale developments at the microprocessor level.
The semiconductor industry is now at the point where anything meaningful must look at the whole picture. Collaboration is key to making impactful contributions. This is a multi-scale, complex problem we are tackling, going from thin layers of material all the way to huge microprocessors.”
We use Technology Computer Aided Design simulations to benchmark the resistance of different options for the back end of line. This is where the individual devices – including transistors, capacitors, and resistors – are connected with wiring on the wafer. These wires are made of copper, but at these new smaller scales, the copper needs a barrier to stop it leaking into the dielectric and damaging the transistor. We are looking at what happens to the circuit performance when you make the barrier liners thinner, and when you use new materials in the wires or the barriers.
We are looking at technologies where the width of the wires is 18 nanometers – the thickness about twice that. At this width, three or four nanometers on each side is going to be taken up by the barrier and liner layers. This leaves the area available for copper substantially smaller. So it is really important to find ways to thin down the layers in the barrier liner.
At the same time, materials scientists are methodically going through the periodic table, to see which elements may be a suitable replacement for copper. Ruthenium is one promising option: it does not need as thick a barrier as copper, and it is less resistive at smaller scales.
“If we are able to show the results of experiments using a high-performance Arm processor, a semiconductor manufacturer will put a lot more value on the results.”
Our research started in around 2015. Early on, we did not have access to the library needed for an advanced node. So we were just focused on very small test cases. We crudely scaled down an older technology, built our own library, and started with some simple, open-access logic circuits that didn't have any memory.
But if we are able to show the results of experiments using a high-performance Arm processor, a semiconductor manufacturer will put a lot more value on the results. This work hinges on benchmarking with real platforms. Any improvements tend to come with some trade-offs and side effects that make other aspects worse. It takes benchmarking in a realistic scenario to judge whether the benefits truly outweigh the disadvantages.
We have had connections with Arm a long time. One of our former PhD students, Divya Prasad, went to Arm for a one-semester internship, and joined the company after she graduated. And I knew some other people at Arm who were looking to collaborate with us. So instead of using our own crude, scaled technology, we began working very closely with Divya and Brian Cline, senior principal research engineer at Arm, to build the infrastructure to run a more realistic simulation.
About a year ago, we were ready to go to the full microprocessor level. Arm gave us the IP for Cortex-A53, and their RAM compiler for the 16 nanometer node. So, thanks to Arm IP, we could do everything to a much bigger scale, that of an entire real-world processor. The foundation for the work is ASAP7, a Process Design Kit developed by Arizona State University in collaboration with Arm, which describes the transistors and the library of logic gates. Once we had this, we could create whatever design we wanted. Da Eun has been looking at designing a Cortex-A53, using the Arm memory compiler to derive the right ASAP7 memory libraries.
One challenge lay in the fact we had a 16 nanometer compiler for the memory. I had to find a way to scale it to seven nanometers, so it would be compatible with our design. I then had to make sure that all these designs were to an industry standard and valid.
That meant going back and forth to Divya and Brian, every month or so, letting them know what we have been doing and asking whether our designs look viable. It is key, for example, to place memory blocks at the right points to make it efficient. If you need to access data from a cache, and you have put that memory far away, you end up having to travel really far to retrieve that data, so your performance can significantly drop.
“We did not expect that if we made the wire perform 50% better at the circuit level, we had also get a 50% improvement in performance across the board. These results just show how important interconnects have become to modern processing.”
Our first efforts are almost complete, and we are about to publish the results in journal papers. The impact we have seen has been huge. The improvement in operating clock frequency of Cortex-A53 has been as high as 50% – just by making the wires better. That is such a massive improvement.
We did not expect it to be that significant. We did not expect that if we made the wire perform 50% better at the circuit level, we would also get a 50 percent improvement in performance across the board, because interconnects are just one part of these very large systems. Plus, we are dealing with highly complex optimization tools, where every time anything changes, the whole design changes. So if you make one component a certain amount better, it does not necessarily translate across the whole system. These results just show how important interconnects have become to modern processing.
Arm has been really helpful. The dialogue with the team is very rewarding, and it is fun to work with them, because they teach us a lot. For example, Divya pointed out that we were using a small number of metal levels in our benchmarking, and that it might be more in line with industry best practice to try more. We then saw some quite different results, which highlighted another interesting trade-off.
“Once we finish off the analysis of a single core variant of Cortex-A53, we want to see the effect on an even bigger circuit – perhaps using four cores or a bigger variant of Cortex-A53.”
The semiconductor industry is now at the point where anything meaningful must look at the whole picture, so collaboration is key to making impactful contributions. This is a multi-scale, complex problem we are tackling, going from thin layers of material all the way to huge microprocessors. Our work should serve as a motivation for the technologists who are working on these very tough problems.
Once we say what will happen if we are able to thin down the barrier by a certain amount, it is their job to do it in a reliable fashion, one that is manufacturable and can be done on billions of transistors and trillions of chips.
But, of course, they need to know that it is worthwhile, that it is going to have an impact. So this collaboration with Arm has been incredibly productive.
This work is very satisfying. I certainly feel like I have found the right field. Hopefully, this type of work will continue and I will be able to make many more contributions in the future. Once we finish off the analysis of a single core variant of Cortex-A53, we want to see the effect on an even bigger circuit – perhaps using four cores or a bigger variant of Cortex-A53.
Eventually I am sure we will want to look at future technology, or at different types of back end of line setup and the impact on memory. As long as we are doing these big core designs, we'll continue to use the Arm compiler. And as long as we are benchmarking, we will definitely continue to use Cortex-A53, to ensure our designs are valid and to industry standards. We hope that other Arm processors will soon become accessible in addition, to continue this collaborative work into the future.
Arm offers free access to a wide range of commercially-proven Arm IP, tools, and other resources – to enable you to do your best research work, on your own terms. We would love to hear more about what you are trying to achieve, and how we can help. To find out more, and to discuss which solution is right for you, get in touch.