Following up from previous post (Software Optimization: Four real-life Streamline use cases (Part 1): Timeline analysis), I will talk about the most basic use of ARM DS-5™ Streamline, software profiling. Streamline captures samples of the program counters at periodic intervals and traces context switches in the Linux kernel to generate profiling reports. These reports cover both kernel and user space, and also give process-level information for Java code running on virtual machines on top of Linux (for example, Android apps).
Many of you will be using a profiling tool in your development flow today, and chances are that this tool will provide you with some sort of tabular view that correlates functions to CPU time. This seems to be norm across the industry. However, this type of visualization alone may lead you in the wrong direction or obfuscate more spurious events. For instance, in the effort to optimize your system, knowing that your application is using up 100% of the CPU for one third of analysis the time is better than learning it only uses 33% of the CPU time across the entire analysis window.
In Streamline you can select a period of time in the Timeline View using the blue callipers at the top, and then click on the Call Paths tab to get a high level report of where the processor was spending time in the selected time period only.
If you select the Xorg process or any of its threads you get a function-level profiling report in the bottom pane. By default this profiling report shows the time spent on each application or shared library. Typically, when a developer finds that an unexpectedly large amount of time is spent on an application or shared library they can add its debug symbols to Streamline in order to generate a new report that shows processor time per function. This way you can dynamically refocus the analysis on areas of interest.
By clicking on a function you can open a code view showing processor time (as in number of program counter samples) falling on different lines of code or dis-assembly. The dis-assembly information is important, as it tells you how your C/C++ code has been compiled, and gives you a hint on why the generated instructions may run slowly on an ARM processor (e.g. memory accesses are typically slower than arithmetic instructions).
This type of profiling report is used in the following way. You run Streamline to find out which applications, threads, functions and lines of code are taking processor time. Once you know them, you apply your optimization efforts on the "hot code". This could be by setting the compiler at the highest performance optimization settings (with a certain disregard for the impact on code size), or by handcrafting the code. The idea here is that optimizing code that runs often will deliver better return on investment than optimizing code that runs infrequently. Streamline is particularly good at profiling because:
Next week I will discuss the use of Streamline for benchmarking. Blogs in this series