By William Orme, Strategic Marketing Manager, Arm and Nick Heaton, Distinguished Engineer, Cadence
This is Part 3 of a 4 part series. Links below
In the previous two parts we introduced the challenges facing designers of complex SoCs and the idea of Peformance Characterization as the first step of a systematic approach to the problem.
Once performance characterization has been satisfactorily completed the second step of the process is to build more realistic use-cases that more closely map to explicit scenarios expected to run on the platform. The objective is not to precisely match the exact traffic that will run on the platform but to match specific traits of the scenario so that similar corner cases may be explored. One of the valuable pieces of information available at this stage is to understand what performance headroom is available when an extreme scenario is running.
Assessing the risk that the system may not perform adequately is very valuable and use-case analysis can provide insight into whether the system has 5%,10% or 20% spare capacity when a worst case scenario is running. This may also indicate whether a SoC may have been over-designed, i.e. whether it has considerable spare bandwidth that may never be used and therefore power consumption that can be reduced or other savings can be made.
When running use cases it is important to understand the different classes of IP that a typical mobile SoC might include and their corresponding traffic types.
Figure 4: Different Master Performance Requirements
Figure 4 shows different performance requirements for key IP functions in a mobile SoC. CPUs typically are most sensitive to latency and so latency is a major system performance requirement. The responsiveness of any system will be dramatically affected by the CPU’s ability to quickly fetch interrupt service routines. Graphics processing units (GPUs) on the other hand, tend to use considerable bandwidth, but as long as they have at least a certain minimum level of bandwidth the system operation is not critically affected. Real-time IPs such as displays are critical to the operation of the platform, and tend to have a minimum bandwidth requirement within a certain time window. If they get that minimum they will operate successfully, if they get just slightly less there could be catastrophic system failure with flickering display or other issues.
Figure 5: Layering Traffic Generation on top of the Performance Testbench
To explore use cases, it is essential that these different types of traffic can be readily generated, constrained and sequenced to model the specific use case in question. In general verification IP is architected to deliver constrained random traffic and therefore need additional features for use case performance exploration. Figure 5 shows how specific traffic generation layers can be added to the VIP to deliver a use case that more closely models the real system.
In addition to traffic generators, sequencing of the various bursty traffic types is also needed. By building the testbench using the Universal Verification Methodology (UVM) and the SystemVerilog language, the creation of sequences to start and stop traffic is easily realized.
Once a use case has been configured, simulations can be run to measure bandwidth and latency metrics for the system. An analysis tool to visualize these metrics is invaluable to ensure that the scenario has been created and configured correctly. Figure 6 shows an example of a realistic use case with bursty bandwidth at different levels on different masters.
Figure 6: Use Case Read Bandwidth Example
By building use-case scenarios on top of a UVM testbench it is possible to build random variability into the scenarios so that multiple runs generate subtly different actual simulations with jitter and skew in the traffic. This standard verification technique can help trap corner cases when applied to performance analysis. This technique combined with a graphical analysis tool that can handle multiple simulation results provides a very powerful way to get substantial coverage and hence confidence that the system will perform as required.
In part four of this series we will introduce a new productivity tool built to simplify the adoption of this two step performance analysis process.
[CTAToken URL = "https://community.arm.com/soc/b/blog/posts/how-to-measure-and-optimize-the-system-performance-of-a-smartphone-rtl-design" target="_blank" text="How to Measure and Optimize the System Performance of a Smartphone RTL Design - Part 1" class ="green"]
[CTAToken URL = "https://community.arm.com/soc/b/blog/posts/how-to-measure-and-optimize-the-system-performance-of-a-smartphone-rtl-design---part-2" target="_blank" text="How to Measure and Optimize the System Performance of a Smartphone RTL Design - Part 2" class ="green"]
[CTAToken URL = "https://community.arm.com/soc/b/blog/posts/how-to-measure-and-optimize-the-system-performance-of-a-smartphone-rtl-design---part-4" target="_blank" text="How to Measure and Optimize the System Performance of a Smartphone RTL Design - Part 4" class ="green"]
[CTAToken URL = "/docs/DOC-7291" target="_blank" text="Cadence System Design and Verification" class ="green"]