This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Using Streamline Instruction Executed counter to measure MIPS

Hello,

We are adding some extra sound effects to Android's mediaserver. We're using DS-5 's Streamline to measure the performance of an active thread that implements this sound effect, on a Nexus 5 phone. This phone's cpu has four cores, which are correctly detected by Streamline. We build the entire Android AOSP platform for Android 5/6, using the prebuilt toolchain supplied by Android. The code that is used to build the shared library corresponding to the thread being measured, was compiled using gcc.

I use DS-5/Streamline by playing a media file for one minute and simultaneously using Streamline to capture the cpu activity. I've done the following

- compiled all code that implements the thread using the flags -g -fno-inline -fno-omit-frame-pointer, as described in

   Streamline User Guide | Recommended compiler options | ARM DS-5 Development Studio

- pushed the compiled shared library (with symbols) to the phone

- in Streamline's Capture & Analysis Options, selected "High Resolution Timeline", and added the location of the shared library with symbols to "Program Images"

After the test, I expand the Cross Section Marker to cover a time period of one minute. The Instructions Executed counter displays the total MIPS for this elapsed period of time.

I filter all counters for the process I want to measure, and divide the filtered Instruction Executed count by 60 to get the MIPS figure, averaged for the four cpu cores.

My questions are:

1. Is this the best way to measure the MIPS, using DS-5?

2. Using these compile-time options seem counter-intuitive when taking profiling measurements. For example, the whole point of using inlining is to speed up the performance. Do these flags apply to profiling measurements?

3. When I don't use the compile time flags -fno-inline -fno-omit-frame-pointer listed above, I get a total Instruction Executed count figure about 35% less (26 Ginstruction vs 40 Ginstruction). However, the indicated CPU activity averaged for the four cores for the same thread is about 15% less (11.3% vs 13.4%). Using or omitting the -g flags makes no difference, which also seems counter-intuitive.

Many thanks,

Paul