What is the connection between rugby football, interconnect and performance analysis kits?
There is a seemingly never-ending march towards smaller, cheaper and more efficiency in complex chip design, and every component of the modern SoC is being squeezed for more with each new design. There is a case of diminishing returns when seeking improvements and designers need to be creative in order to find new ways to eke out those extra bits of performance that ultimately make the difference across the entire chip. The World Cup-winning rugby coach Sir Clive Woodward famously stated that excellence was best achieved by improving 100 factors by 1%, and this theory certainly holds true for a lot of the SoC’s that are being designed these days. Staying on the theme of rugby for the moment, the interconnect is like a scrum half (or a quarterback for those of you who live east of the Atlantic!) as it acts as the go-between for each component and marshals them effectively to make the chip greater than the sum of its parts. A scrum half’s performance is measured by the speed and efficiency with which he passes the ball to his teammates, thus enabling them to do their job more effectively, similarly to how you would want your system interconnect to function.
This role increases in importance as massive growth in system integration places on-chip communication at the centre of system performance. The ARM CoreLink NIC-400 is a very powerful and highly configurable interconnect with many high-end features to manage traffic passing through it. It is in fact so configurable that it is regularly one of the most popular IP models created and downloaded on The specified item was not found.’ IP exchange portal for virtual prototyping (found here). This configurability allows a single user to create dozens of models for the system interconnect, and reflects the importance that users place on having accurate models for the components in their system that have a great influence on overall performance. With so many parameters in play the ability to test the interconnect within the system prior to tapeout is clearly of great value. Just setting all parameters to max performance is rarely a sensible option as power and cost budgets demand that less silicon is used to achieve the same levels of performance the full system modelling allows refinement to save silicon are, reduce the number of wires without compromising performance goals.
While the configurability of the interconnect is an inherent and indeed crucial part of its effectiveness, the vast amount of choices available also means that users often do not fully optimise the interconnect to their individual system. This is where virtual prototyping tools come into the equation, and help designers to avoid arbitration problems, detect system bottlenecks and give a better picture of how to manage PPA requirements. This ability to foresee and avoid potential issues before they become a problem is invaluable in an age where the pressure to get designs right first-time and on time is a concern of every system architect. Additionally, the depth of analysis that the Carbon tool can undertake provides fast and meaningful feedback that can help you measurably improve your design. Last year I co-wrote a white paper on this subject with billneifert, titled “Getting the most out of the ARM CoreLink NIC-400”, which is available to download.
In the example shown here, a simple NIC-400 is configured with two masters and two slaves. The masters are set up to mimic the data loads from a CPU and DMA controller and the dummy targets are an Ethernet MAC and a DDR3 memory controller. Of course, since the traffic generators are quite configurable, it’s possible to model any number of different sources or targets and we’ll get more into that in a bit. Note though that we’re analysing traffic on any of the connections. The graphs shown here track the latency on the CPU interface and the queues in the DDR controller. The exact metrics for the system in question will of course vary based upon design targets however. It’s also beneficial to correlate data across multiple analysis windows and indeed even across multiple runs.
The important thing we’ve done here is establish a framework to begin gathering quantitative data on the performance of the NIC-400 so we can track how well it meets the requirements. The results can be analysed which will likely lead to reconfiguration, recompilation and re-simulation. It’s not unheard of to iterate through hundreds of various design possibilities with only slight changes in parameters. It’s important to vary the traffic parameters as well as the NIC parameters however since the true performance metric of the NIC-400 and really, all interconnect IP, is how it impacts the behavioural characteristics of the entire system.
I will be going into more detail on all of this on Thursday at 18:00 BST (1:00 pm EDT, 10:00 am PDT) in a webinar titled “Pre-silicon optimisation of system designs using the ARM CoreLink NIC-400 Interconnect” with esondhi, a corporate applications engineer at Carbon Design Systems. You can register for the webinar here, and make sure to attend live to ensure that your questions are answered immediately.
Funny start for this article, but is really interesting after full reading thanks