Shrink Your MCU code size with GCC ARM Embedded 4.7

September 11, 2013

GNU Tools for ARM embedded processors, or GCC ARM Embedded for short, version 4.7 is now available.

The previously released version, 4.6, had more than 30,000 downloads. As well as new features such as MAC OS hosting, GDB enhancement, and other optimizations, the most exciting feature in version 4.7 is the reduction in generated code size.

Why code size?

The reason, and something that most MCU software developers already know, lies in the extreme resource limitation and cost sensitivity of MCU programming. For those who haven't experienced this, here are some quotations from some of our users:

"Please please please remember that we are seeing more and more memory limited parts in this world - for example, 4KB flash, 1KB RAM - and every word of "stack space" used, never mind the flash size consumed by code."

"If the total code size exceeds the internal flash memory of the MCU (as in my case) I must ..."

GCC ARM Embedded 4.7 reduces code size by optimizing the compiler and associated libraries

Optimizing the compiler for generated code size is nothing new. GCC with optimization level Os will generate code that is smaller in size. But most of the active development on GCC is more focused on performance at the moment, and this leaves more room for size optimization to catch up. GCC ARM Embedded 4.7 includes the latest code size optimizations committed by ARM compiler team.

Among the many code size optimizations, there is basic block reordering for size, which reorders the basic blocks to reduce long jumps. Also, there is hoisting enhancement, which attempts to extract as many common expressions as possible to a common predecessor while keeping register pressure reasonably low. Other optimizations include more hard register copying and less use of ARM higher 8 core registers (refer to ARMv6-M Architecture Reference Manual). Measured on an ARM Cortex-M0 processor with code size benchmarks, version 4.7 with Os generates 2% less code when compared to previous versions.

The diet plan for libraries

Libraries also need optimizing, because the libraries included in GCC ARM Embedded were not actually designed for MCU programming. Newlib, the C library in the toolchain, implements printf functions that are so complicated they require about 37K bytes of FLASH and 5K bytes of RAM to run a simple hello-world program. That's far too large for MCU programming where you might need printf functionality for debugging and logging purposes. The good news is that there is plenty of unnecessary "fat" in libraries that can be cut.

The diet plan for libraries is to cut the unnecessary features, re-implement features with simpler logic, and build while optimizing for size. It results in a set of new libraries called newlib-nano. Namely based on newlib, but with a much smaller size.

Newlib-nano cuts some features that were added after C89, which are believed to be rarely used in MCU programming. By limiting the format converter to the C89 standard, format string processing code in printf is greatly reduced. By removing the iov buffering, all IO function sizes are again significantly reduced. Removal of wide char support in non-wide char function further squeezes string IO logic. Newlib-nano also extensively uses the weak symbol technique to exclude features that are rarely used in normal MCU programs. For example, referencing floating point IO and atexit as weak functions dramatically cuts the size of printf() and exit().

Newlib-nano also re-implements memory allocation functions, to replace the original ones that have overall better performance but with lots of complex logic which increases code size. The so called nano-allocator uses simple and native algorithms to handle allocation, de-allocation, and defragmentation. It works effectively when the total memory that can be allocated is small. More importantly, it is only about one sixth of the original size.

Newlib-nano is built with optimization level Os. This results in smaller memcpy and memset because newlib chooses a simple version of these functions when it finds them built with Os. It also discards some optimizations in C++ libraries that are large. An additional build flag for newlib-nano is -fno-exception, which disables the exception handling of libraries. This is acknowledged to be acceptable by some MCU C++ developers.

Conclusion

To summarize, the newlib-nano can cut the size of hello-world programs by around 80%. In extreme cases for C++ programs, the size reduction could exceed 90%.

It is easy to use newlib-nano in real projects with GCC ARM Embedded 4.7. Normally, it is only necessary to specify one additional linker option. Driver specifications in the toolchain will link with newlib-nano libraries instead of normal libraries.

Patches included in this release are either already in mainline, or on the way to the mainline. It will take some time to upstream aggressive changes to newlib-nano.

Overall, GCC ARM Embedded 4.7 represents a big leap in the open source Cortex-M development toolchain. Why not check it out yourself below?

GNU ARM Embedded Toolchain 4.7

Azzo over 5 years ago

Hi, I'm using Newlib-nano' library on an Infineon XMC4500 with the DAVE4 Eclipse IDE and have a problem with the 'sscanf' function.

DAVE 4 does not interpret the upper case specifier "%X" but it does interpret the lower case specifier "%x" for both lower case AND upper case HEX input.

Here are some examples...

unsigned int Data;

// The following are OK, they return a Data value of 15
sscanf( "0xf", "%x", &Data );
sscanf( "0xF","%x", &Data );
sscanf( "F", "%x", &Data );
sscanf( "f", "%x", &Data );

// The following are NOT OK, the Data value does not change
sscanf( "0xF", "%X", &Data );
sscanf( "0xf", "%X", &Data );
sscanf( "F", "%X", &Data );
sscanf( "f", "%X", &Data );

Could you tell me if this behaviour is expected in the Newlib-nano' library please.
Thank you very much
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Azzo over 5 years ago

Hi, I'm using Newlib-nano' library on an Infineon XMC4500 with the DAVE4 Eclipse IDE.

DAVE 4 does not interpret the upper case specifier "%X" but it does interpret the lower case specifier "%x" for both lower case AND upper case input.

Here are some examples...

unsigned int Data;

// The following are OK, they return a Data value of 15
sscanf( "0xf", "%x", &Data );
sscanf( "0xF","%x", &Data );
sscanf( "F", "%x", &Data );
sscanf( "f", "%x", &Data );

// The following are NOT OK, the Data value does not change
sscanf( "0xF", "%X", &Data );
sscanf( "0xf", "%X", &Data );
sscanf( "F", "%X", &Data );
sscanf( "f", "%X", &Data );

Could you tell me if this behaviour is expected in the Newlib-nano' library please.

Thank you very much
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Christopher Morgan over 12 years ago

Hello.

Are you guys planning to feed your changes back to the newlib developers so people that are using newlib will be able to enable these space saving options? I think I've seen some patches thus far for malloc() but didn't spot any of the print related changes and I know from experience with other toolchains that a lot of codespace can be saved with more limited implementations of the xprintf series of functions.

Chris
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Bastian Schick over 12 years ago

Hi

Thanks to share your feedback. We are working on basic Cortext-A support, which will be in future release. As to R4 big endian, we haven't noticed requirement of it yet. If you have specific requirement please let us know.

Regarding big-endian/TMS570, there has been a discussion on launchpad.
One question is also, if the launchpad GCC qualifies for safety critical systems. Maybe, if the results of the regression tests would be available (or one runs them on his own).

I modified gcc version 4.6.2 20120316 (release) [ARM/embedded-4_6-branch revision 185452]:
> arm-none-eabi-gcc -print-multi-lib
thumb;@mthumb
fpu;@mfloat-abi=hard
armv7-r;@march=armv7-r
armv7-r/vfpv3;@mfloat-abi=hard@march=armv7-r@mfpu=vfpv3-d16
armv7-m;@mthumb@march=armv7-m
armv6-m;@mthumb@march=armv6s-m
armv7e-m;@mthumb@march=armv7e-m
armv7-r/thumb2;@mthumb@march=armv7-r
armv7e-m/fpv4;@mthumb@mfloat-abi=hard@march=armv7e-m@mfpu=fpv4-sp-d16
armv7-r/thumb2/vfpv3;@mthumb@mfloat-abi=hard@march=armv7-r@mfpu=vfpv3-d16
armv7-r/be;@mbig-endian@march=armv7-r
armv7-r/vfpv3/be;@mbig-endian@mfloat-abi=hard@march=armv7-r@mfpu=vfpv3-d16
armv7-r/thumb2/be;@mbig-endian@mthumb@march=armv7-r
armv7-r/thumb2/vfpv3/be;@mbig-endian@mthumb@mfloat-abi=hard@march=armv7-r@mfpu=v
fpv3-d16

Cheers,
42Bastian
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Joey Ye over 12 years ago

Still no Cortex-R4 big endian support. And still no Cortex-A support (yes, there are people using Cortex-A9 bare-metal).

Thanks to share your feedback. We are working on basic Cortext-A support, which will be in future release. As to R4 big endian, we haven't noticed requirement of it yet. If you have specific requirement please let us know.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Embedded and Microcontrollers blog

Formally verifying a floating-point division routine with Gappa – part 2

Simon Tatham

A method of testing whether a numerical error analysis using Gappa really matches the code it is intended to describe.
- September 4, 2025
Formally verifying a floating-point division routine with Gappa – part 1

Simon Tatham

Learn the basics of using Gappa for numerical error analysis, using floating-point division in Arm machine code as a case study.
- September 4, 2025
Building Solutions on Arm: A recap of IEEE Arm Community Technothon project presentation

Fidel Makatia

Read Fidel's account from the Arm Community Technothon!
- December 4, 2024

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Shrink Your MCU code size with GCC ARM Embedded 4.7

Why code size?

GCC ARM Embedded 4.7 reduces code size by optimizing the compiler and associated libraries

The diet plan for libraries

Conclusion

Formally verifying a floating-point division routine with Gappa – part 2

Formally verifying a floating-point division routine with Gappa – part 1

Building Solutions on Arm: A recap of IEEE Arm Community Technothon project presentation