In my presentation at ARM TechCon 2013 last week "ARM Cortex-M for Beginners", someone asked a question of “Is there any advice for getting the best performance? Which C compiler to use?
This is a very interesting question. One page of my slides showed the Dhrystone (version 2.1) and CoreMark results for various Cortex-M processors, assuming zero wait state memory. However, the actual performance when running these benchmarks on specific microcontroller devices depends on the compiler and compilation options used, memory system wait states, etc. Depending on the microcontroller setup and tool chain used for the compilation, you can get different results.
To make things more complicated, the results of running these benchmarks might have no correlation to the performance of running your own application. These types of generic benchmarks are often focused on general data processing, and your applications might have completely different types of operations such as I/O controls, OS context switching, etc. As a result, the exact performance of your applications on different processors might not match the relative performance of running CoreMark or Dhrystone.
If your application requires high performance (again this is a vague term) then in an ideal world you should benchmark different devices and tools using your own application code. Bear in mind that even if a microcontroller can provide the required processing performance, you still need to leave additional performance head room in case some processing time is consumed by a large number of interrupt requests in short time. If you need to spend lots of time optimizing the code to get the performance to the required level, you might have chosen the wrong device. And when you compile for the maximum performance, the code size could increase significantly due to unrolling. In many cases, it is easier and safer to get a higher performance microcontroller at similar cost, while providing the extra performance headroom.
Back to the choice of C compilers, there is no golden answer. Compiler A might be better in CoreMark, and compiler B might be better in Dhrystone, and the results can change overtime when different versions of the toolchains are released. Don’t forget that the performance of the generated code is only one aspect of choosing your development tools. There are many other factors to consider: license cost, debug features, device support, reliability, technical support, code size, code editor and even middleware bundles.On the brighter side, you can download the evaluation versions of most of the popular tool chains and try them out before you decide which one to buy. There are a number of low cost evaluation boards available which can work with multiple tool chains. So you can try out the same program, on the same board, with different C compilers easily.Some of the C compiler vendors have application notes or documents to explain how to select the right compilation options to get the best performance. For example, Keil Microcontroller Development Kit (MDK) has application note 202 : MDK-ARM Compiler Optimization (http://www.keil.com/appnotes/docs/apnt_202.asp)In summary, when choosing your development tool chain, remember that performance is only one of the many factors. And if your application could fail unless you compile the code with the maximum speed optimizations, think again to see if you are using the right microcontroller device .
Thanks Jens.
On the topic of writing optimizd C code, I forgot to mention chrisshore has wrote some papers on this topic. For example, Efficient C code for ARM Devices.
Since you use ggc I think it might worth mention this (you might know this already): ARM have a software team working on optimizing gcc for ARM embedded processors. You can download the latest version of their gcc from GCC ARM Embedded in Launchpad
regards,
Joseph
This is one of my favorite hobbies; optimizing. Very good article.
In my world, I often write my code in assembler rather than C. This is because I tend to write time-critical code, where a single clock cycle is often very important.
But when not writing assembler code, I use gcc, which is very impressive (well, it has to be good anyway, because it's been improved over many, many years now). I would expect that ARM's own compiler will produce the best code, though.
I cannot use ARM's compiler, as it's not built for my development platform, which is why I use gcc.
I agree 100% with what you write above. Especially leaving as much free CPU time as you can.
After a few years, you might begin to adopt 'speed-optimizing' in the way you write your code.
Some of the optimizing depend on the compiler, some of it depend on how the code is written.
For instance:
volatile int32_t a;
if(a & 1){ a = ~a; }
produces code that resembles this:
ldr r1,=a
ldr r0,[r1]
lsls r0,r0,#31
bpl skip
ldr r0,[r1] /* the compiler does not know that this can be omitted. */
mvn r0,r0
str r0,[r1]
skip:
changing the code to:
volatile uint32_t a;
uint32_t temp;
temp = a;
if(temp & 1){ a = ~temp; }
which produces something like this instead:
lsls r2,r0,#31
mvnmi r2,r0 /* suddenly it might pay to spend 2 clock cycles instead of using a branch */
strmi r0,[r1]
It saves a memory (or peripheral) access, even though the code is written in C.