How to Measure and Optimize the System Performance of a Smartphone RTL Design - Part 3

November 18, 2013

4 minute read time.

By William Orme, Strategic Marketing Manager, Arm and Nick Heaton, Distinguished Engineer, Cadence

This is Part 3 of a 4 part series. Links below

Part 3

Use-case Performance Analysis

In the previous two parts we introduced the challenges facing designers of complex SoCs and the idea of Peformance Characterization as the first step of a systematic approach to the problem.

Once performance characterization has been satisfactorily completed the second step of the process is to build more realistic use-cases that more closely map to explicit scenarios expected to run on the platform. The objective is not to precisely match the exact traffic that will run on the platform but to match specific traits of the scenario so that similar corner cases may be explored. One of the valuable pieces of information available at this stage is to understand what performance headroom is available when an extreme scenario is running.

Assessing the risk that the system may not perform adequately is very valuable and use-case analysis can provide insight into whether the system has 5%,10% or 20% spare capacity when a worst case scenario is running. This may also indicate whether a SoC may have been over-designed, i.e. whether it has considerable spare bandwidth that may never be used and therefore power consumption that can be reduced or other savings can be made.

When running use cases it is important to understand the different classes of IP that a typical mobile SoC might include and their corresponding traffic types.

Figure 4: Different Master Performance Requirements

Figure 4 shows different performance requirements for key IP functions in a mobile SoC. CPUs typically are most sensitive to latency and so latency is a major system performance requirement. The responsiveness of any system will be dramatically affected by the CPU’s ability to quickly fetch interrupt service routines. Graphics processing units (GPUs) on the other hand, tend to use considerable bandwidth, but as long as they have at least a certain minimum level of bandwidth the system operation is not critically affected. Real-time IPs such as displays are critical to the operation of the platform, and tend to have a minimum bandwidth requirement within a certain time window. If they get that minimum they will operate successfully, if they get just slightly less there could be catastrophic system failure with flickering display or other issues.

Figure 5: Layering Traffic Generation on top of the Performance Testbench

To explore use cases, it is essential that these different types of traffic can be readily generated, constrained and sequenced to model the specific use case in question. In general verification IP is architected to deliver constrained random traffic and therefore need additional features for use case performance exploration. Figure 5 shows how specific traffic generation layers can be added to the VIP to deliver a use case that more closely models the real system.

In addition to traffic generators, sequencing of the various bursty traffic types is also needed. By building the testbench using the Universal Verification Methodology (UVM) and the SystemVerilog language, the creation of sequences to start and stop traffic is easily realized.

Once a use case has been configured, simulations can be run to measure bandwidth and latency metrics for the system. An analysis tool to visualize these metrics is invaluable to ensure that the scenario has been created and configured correctly. Figure 6 shows an example of a realistic use case with bursty bandwidth at different levels on different masters.

Figure 6: Use Case Read Bandwidth Example

By building use-case scenarios on top of a UVM testbench it is possible to build random variability into the scenarios so that multiple runs generate subtly different actual simulations with jitter and skew in the traffic. This standard verification technique can help trap corner cases when applied to performance analysis. This technique combined with a graphical analysis tool that can handle multiple simulation results provides a very powerful way to get substantial coverage and hence confidence that the system will perform as required.

In part four of this series we will introduce a new productivity tool built to simplify the adoption of this two step performance analysis process.

How to Measure and Optimize the System Performance of a Smartphone RTL Design - Part 1

How to Measure and Optimize the System Performance of a Smartphone RTL Design - Part 2

How to Measure and Optimize the System Performance of a Smartphone RTL Design - Part 4

Cadence System Design and Verification

0 comments
0 members are here

SoC Design and Simulation blog

Performance verification with AMBA Viz

Tony Nip

Run consistent latency and bandwidth checks on CMN interconnects using AMBA Viz’s new performance script—no API expertise needed.
- June 30, 2025
Understanding Scandump: A key silicon debugging technique

Vincent Yang

Scandump is highly effective in silicon debugging as it can capture most internal states through scan chains, making it invaluable in diagnosing silicon issues.
- June 5, 2024
Introduction to AMBA Viz

Tony Nip

AMBA Viz enables faster debug and performance analysis for cycle-accurate simulation and emulation, even for complex interconnects and AMBA bus protocols.
- May 31, 2024

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

How to Measure and Optimize the System Performance of a Smartphone RTL Design - Part 3

Part 3

Use-case Performance Analysis

Performance verification with AMBA Viz

Understanding Scandump: A key silicon debugging technique

Introduction to AMBA Viz