Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Tools, Software and IDEs blog Software Optimization: Four real-life Streamline use cases (Part 2)
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • ds-5
  • Streamline
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Software Optimization: Four real-life Streamline use cases (Part 2)

Guilherme Marshall
Guilherme Marshall
September 11, 2013
3 minute read time.

Smart software profiling

Following up from previous post (Software Optimization: Four real-life Streamline use cases (Part 1): Timeline analysis), I will talk about the most basic use of ARM DS-5™ Streamline, software profiling. Streamline captures samples of the program counters at periodic intervals and traces context switches in the Linux kernel to generate profiling reports. These reports cover both kernel and user space, and also give process-level information for Java code running on virtual machines on top of Linux (for example, Android apps).


Average is not enough

Many of you will be using a profiling tool in your development flow today, and chances are that this tool will provide you with some sort of tabular view that correlates functions to CPU time. This seems to be norm across the industry. However, this type of visualization alone may lead you in the wrong direction or obfuscate more spurious events. For instance, in the effort to optimize your system, knowing that your application is using up 100% of the CPU for one third of analysis the time is better than learning it only uses 33% of the CPU time across the entire analysis window.

  In Streamline you can select a period of time in the Timeline View using the blue callipers at the top, and then click on the Call Paths tab to get a high level report of where the processor was spending time in the selected time period only.

  If you select the Xorg process or any of its threads you get a function-level profiling report in the bottom pane. By default this profiling report shows the time spent on each application or shared library. Typically, when a developer finds that an unexpectedly large amount of time is spent on an application or shared library they can add its debug symbols to Streamline in order to generate a new report that shows processor time per function. This way you can dynamically refocus the analysis on areas of interest.

  By clicking on a function you can open a code view showing processor time (as in number of program counter samples) falling on different lines of code or dis-assembly. The dis-assembly information is important, as it tells you how your C/C++ code has been compiled, and gives you a hint on why the generated instructions may run slowly on an ARM processor (e.g. memory accesses are typically slower than arithmetic instructions).



  This type of profiling report is used in the following way. You run Streamline to find out which applications, threads, functions and lines of code are taking processor time. Once you know them, you apply your optimization efforts on the "hot code". This could be by setting the compiler at the highest performance optimization settings (with a certain disregard for the impact on code size), or by handcrafting the code. The idea here is that optimizing code that runs often will deliver better return on investment than optimizing code that runs infrequently.

  Streamline is particularly good at profiling because:

  • You can profile as many applications as you wish. Processing is done on a host computer, where you can fit pretty much as many processors and as much memory as you wish. Other profiling tools that run on the target are very limited in terms of the size of the application's debug symbols
  • You can filter the report over time easily and dynamically. The profiling tools mentioned above require that you instrument your application to start and stop the data capture, which means that you need to know what you want to profile before running the code. In Streamline you can see what happened during the execution before focusing the analysis on the  specific period of time that seems "interesting"
  • The navigation from the high level Timeline View down to Dis-assembly is fast and straightforward, and the tool really responsive, so that you can focus on improving the code, not dealing with the tool or performance analysis process

  Next week I will discuss the use of Streamline for benchmarking.
 
Blogs in this series

  • Software Optimization: Four real-life Streamline use cases (Part 1) - Timeline analysis
  • Software Optimization: Four real-life Streamline use cases (Part 2) - Software Profiling
  • Software Optimization: Four real-life Streamline use cases (Part 3) - Benchmarking
Anonymous
Tools, Software and IDEs blog
  • Python on Arm: 2025 Update

    Diego Russo
    Diego Russo
    Python powers applications across Machine Learning (ML), automation, data science, DevOps, web development, and developer tooling.
    • August 21, 2025
  • Product update: Arm Development Studio 2025.0 now available

    Stephen Theobald
    Stephen Theobald
    Arm Development Studio 2025.0 now available with Arm Toolchain for Embedded Professional.
    • July 18, 2025
  • GCC 15: Continuously Improving

    Tamar Christina
    Tamar Christina
    GCC 15 brings major Arm optimizations: enhanced vectorization, FP8 support, Neoverse tuning, and 3–5% performance gains on SPEC CPU 2017.
    • June 26, 2025