In my presentation at ARM TechCon 2013 last week "ARM Cortex-M for Beginners", someone asked a question of “Is there any advice for getting the best performance? Which C compiler to use?
This is a very interesting question. One page of my slides showed the Dhrystone (version 2.1) and CoreMark results for various Cortex-M processors, assuming zero wait state memory. However, the actual performance when running these benchmarks on specific microcontroller devices depends on the compiler and compilation options used, memory system wait states, etc. Depending on the microcontroller setup and tool chain used for the compilation, you can get different results.
To make things more complicated, the results of running these benchmarks might have no correlation to the performance of running your own application. These types of generic benchmarks are often focused on general data processing, and your applications might have completely different types of operations such as I/O controls, OS context switching, etc. As a result, the exact performance of your applications on different processors might not match the relative performance of running CoreMark or Dhrystone.
If your application requires high performance (again this is a vague term) then in an ideal world you should benchmark different devices and tools using your own application code. Bear in mind that even if a microcontroller can provide the required processing performance, you still need to leave additional performance head room in case some processing time is consumed by a large number of interrupt requests in short time. If you need to spend lots of time optimizing the code to get the performance to the required level, you might have chosen the wrong device. And when you compile for the maximum performance, the code size could increase significantly due to unrolling. In many cases, it is easier and safer to get a higher performance microcontroller at similar cost, while providing the extra performance headroom.
Back to the choice of C compilers, there is no golden answer. Compiler A might be better in CoreMark, and compiler B might be better in Dhrystone, and the results can change overtime when different versions of the toolchains are released. Don’t forget that the performance of the generated code is only one aspect of choosing your development tools. There are many other factors to consider: license cost, debug features, device support, reliability, technical support, code size, code editor and even middleware bundles.On the brighter side, you can download the evaluation versions of most of the popular tool chains and try them out before you decide which one to buy. There are a number of low cost evaluation boards available which can work with multiple tool chains. So you can try out the same program, on the same board, with different C compilers easily.Some of the C compiler vendors have application notes or documents to explain how to select the right compilation options to get the best performance. For example, Keil Microcontroller Development Kit (MDK) has application note 202 : MDK-ARM Compiler Optimization (http://www.keil.com/appnotes/docs/apnt_202.asp)In summary, when choosing your development tool chain, remember that performance is only one of the many factors. And if your application could fail unless you compile the code with the maximum speed optimizations, think again to see if you are using the right microcontroller device .
cbeckmann I'm sorry to hear that. As there can be subtle differences between toolchain version, the best place to report these issues depends on where you get your toolchain from.
Posting to these communities allows for the bug to be seen by anyone in that community, which can often speed up the development of a fix, and helps to correctly triage the issue. In the short term you may find it useful to upgrade to a newer version of GCC, the ARM code generation is continually improving!
jgreenhalgh. We need to get linked. Our company optimized the CMSIS librady for the Cortex-M4 and is getting really efficient code from C-intrinsics. But we are finding that the Cortex-A does not work quite so well. We are currently carefully profiling between the Cortex-A8, C-A9, and C-A15 to get a better understanding why C-intrinsics isn't doing a better job since it should map to NEON instruction sets pretty closely. We think it might be the code around it, all the looping and everything that slows down things. Glad to hear that ARM is improving GCC because it would be very impractical to have to maintain libraries in assembly code.
The team I work in is focused on improving GCC for the Cortex-A processors. We work with the GCC ARM Embedded team, but their release compiler is quite different in terms of the engineering focus. (To reflect the difference in code generation strategies one might apply in the embedded world).
For information about the GCC ARM Embedded project the best place to look for updates and documentation on improvements is the Launchpad page linked above.
James Greenhalgh, isn't that the subject you wished to write about? (CC:philipperobin)
Actually I wasn't aware that ARM was behind this. So now I can upgrade from 4.8.1 to 4.7 (!!) with confidence. =)
And the document from Chris looks really good. I'll have to study it in details.
Thank you for this valuable information.