We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
We are using an ARM Cortex-M4 in our application. Recently I was dealing with optimized critical DSP code. The code is written in C and compiled for the target using ARM Compiler 6 (armclang). When testing, I get a cycle count with is considerably more than expected. Peeping into the disassembly, it looks like the generated code is pretty good, which makes me assume the difference comes from pipeline stalls.
Assuming a small function is preformed in an interrupt-free environment, what would be considered a good utilization of the pipeline, that I can expect from compiled code?
As a baseline - in our case, the C code is translated to ~3200 (LDRSH, SMLABB) instructions, while the execution time is ~5000 cycles.
While there - Using the ARM Developer Studio environment and a DSTREAM debugger, is it possible to observe individual pipeline stages, and see where bubbles are formed?
Dear yaniv.sapir,
The Cortex-M4 TRM has a table of the instructions timings where you can see that cycles per instructions vary from 1 to more than 10 in some cases. The memory operations might also be subject to memory latency, if any. Therefore giving an estimate for your specific case is maybe not appropriate.
Best regards,
Vincent.