This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

What is considered "good" and "average" pipeline utilization?

We are using an ARM Cortex-M4 in our application. Recently I was dealing with optimized critical DSP code. The code is written in C and compiled for the target using ARM Compiler 6 (armclang). When testing, I get a cycle count with is considerably more than expected. Peeping into the disassembly, it looks like the generated code is pretty good, which makes me assume the difference comes from pipeline stalls.

Assuming a small function is preformed in an interrupt-free environment, what would be considered a good utilization of the pipeline, that I can expect from compiled code?

As a baseline - in our case, the C code is translated to ~3200 (LDRSH, SMLABB) instructions, while the execution time is ~5000 cycles.

While there - Using the ARM Developer Studio environment and a DSTREAM debugger, is it possible to observe individual pipeline stages, and see where bubbles are formed?

  • Dear ,

    The Cortex-M4 TRM has a table of the instructions timings where you can see that cycles per instructions vary from 1 to more than 10 in some cases. The memory operations might also be subject to memory latency, if any. Therefore giving an estimate for your specific case is maybe not appropriate.

    • If your code is small enough, doing an estimation manually might be feasible.
    • Modifying your code to read the DWT cycle counter to perform measurements might be an option, too.
    • A cycle model of the Cortex-M4 is also available on ipexplorer, which may allow you to simulate your code precisely.

    Best regards,

    Vincent.