Arm Neoverse CPUs provide cloud workloads with a high performance and energy efficient computing platform. To maximize application performance, you can tune your application software for the underlying hardware. To do this you need high quality performance data from the hardware, and performance analysis tooling to capture and interpret it.
The Streamline CLI Tools are a set of new free-of-charge command-line tools that can profile and analyze workloads directly on an Arm Neoverse server running Linux. The tools implement the Arm CPU top-down profiling methodology, giving clear and actionable performance data with minimal user configuration. You can use this data to guide the optimization of the heavily used functions in your application software.
A simple formula for understanding the performance of a software application is:
Delivered performance = Utilization × Efficiency
In this equation:
To get the best software performance you must implement an effective software algorithm, and then achieve high processor utilization and execution efficiency when running it.
The processing core of a modern Arm CPU is represented in the top-down profiling methodology as an abstract model consisting of 3 major phases.
The top-down methodology defines the maximum available processing capacity as the total number of micro-operation (micro-op) issue slots multiplied by the number of cycles in the measurement window. Arm CPU performance counters are used by the tools to compute metrics that can attribute slot capacity to specific behaviors in the core.
These four basic metrics provide the root nodes of the top-down tree, giving visibility of the fundamental “utilization” and “efficiency” metrics that are so important for high performance software. Additional levels of metrics hierarchy below each basic node provide a more detailed breakdown for causal analysis.
This hierarchical approach, with clear causal metrics, provides an intuitive way to find and understand the microarchitecture-sensitive performance issues in your software. Using this information, you can target the problem with specific corrective actions to improve the performance.
The Streamline CLI Tools are a set of Arm-native command-line tools that are designed to profile and analyze entirely on the server. The workflow generates summary spreadsheets are downloaded from the server to visualize the results, but your bulk profiling data never leaves the cloud environment.
The top-down metrics provides a systematic approach to identifying performance problems in your software, but this is only actionable feedback if the metrics are associated with a specific location in the running program. The new feature we are introducing with the Streamline CLI Tools is function-by-function attribution of the top-down metrics.
In the example below, we profiled the Arm ASTC texture compressor running on a Neoverse V1. This Profile shows that the compute_avgs_and_dirs_3_comp_rgb() function is the most significant hotspot in the compressor. This function has a low Retire rate and a high level of Bad speculation. The level two metrics for Bad speculation indicate that a high percentage of branches are mis predicting, and that there are a significant number of branch misses per thousand instructions (MPKI). This easy-to-read profiling report gives us a clear indicator of both where to optimize, and what needs changing.
compute_avgs_and_dirs_3_comp_rgb()
After optimizing this function, replacing unpredictable data-driven branches with NEON conditional selects, we repeated the Profile. This confirmed the impact of the change, which gives a very nice 10% performance uplift, and shows that the Bad speculation metric for this function drops back to a typical baseline level. Mission complete.
Read our tutorial to get started with Streamline CLI Tools. It’s easy to download and install the tools directly onto your Arm server using wget, or you can download the package from our website.
Please let us know about any features you would like to see in a future release, or any issues encountered using the tools. You can contact us through the Streamline GitHub issue tracker, or you can email the product team at performancestudio@arm.com.
Get started