Skip navigation


1 2 3 Previous Next

Software Development Tools

144 posts

Join us for a webinar on 11th February on using STM32CubeMX with Keil MDK.



STMicroelectronic's STM32CubeMX is a powerful graphical software configuration tool which enables users to generate C initialization code using a wizard interface. In this webinar, you can learn how to use it together with Keil MDK to set up and maintain projects for the STM32 microcontroller families.


Speaker Biography

This webinar will be presented by Matthias Hertal, Product Specialist MCU Development Tools. Matthias has a deep knowledge about Keil MDK, ARM's development suite for Cortex-M based microcontrollers. With 15 years of experience in the microcontroller tools market he knows a solution for almost every development requirement.



Register here. You can join the session by using a Mac, PC or a mobile device.


  • Start Time:

    11-Feb-2016 16:00 GMT (Europe/Dublin)
  • End Time:

    11-Feb-2016 16:45 GMT (Europe/Dublin)


Using STM32CubeMX with Keil MDK

Join us for a webinar on 16th February on advanced debug and trace on NXP TWR-K64F120M using ULINKpro.



NXP’s Kinetis K64 MCU Tower System Module features an ARM Cortex-M4 based low-power MCU with 1 MB Flash, 256 KB SRAM, USB and Ethernet MAC. In this webinar, you will learn how to use ULINKpro to debug and trace embedded applications on this powerful development board.




Speaker Biography

This webinar is presented by christopherseidl, Technical Marketing Manager at ARM. Christopher has over ten year experience in ASIC design and ARM cores and is now member of the technical marketing team for Keil MDK, ARM's leading software development environment for Cortex-M based devices.



Register here. You can join the session by using a Mac, PC or a mobile device.


  • Start Time:

    16-Feb-2016 16:00 GMT (Europe/Dublin)
  • End Time:

    16-Feb-2016 16:45 GMT (Europe/Dublin)


Advanced Debug and Trace on NXP TWR-K64F120M using ULINKpro



The GNU ARM Eclipse project includes a set of open source Eclipse plug-ins and tools to create/build/debug/manage ARM (32-bits) and AArch64 (64-bits) applications and static/shared libraries, using the latest GNU ARM GCC toolchains.


Eclipse Marketplace


Aiming to further improve the user experience while installing/updating the plug-ins, the GNU ARM Eclipse project was registered to Eclipse Marketplace:




The main advantage of using the Eclipse Marketplace is a simplified install procedure, no longer requiring to manually enter the update site address.


The Install button


The Eclipse Marketplace not only provides a centralised index to locate projects, it goes one step further and provides a drag-and-drop browser button.


As seen above, the button can be included in any web page.


To use it, just drag-and-drop the button to a running Eclipse, and the plug-ins to install/update will be automatically identified:



If, for any reason, this does not work, it is always possible to search the Eclipse Marketplace manually, from within Eclipse menus:



More info


For more details about the GNU ARM Eclipse project, please refer to the project site



Link Time Optimization (LTO) is a form of interprocedural optimization which, as the name suggests, is performed at the time of linking a program. This is particularly useful when building an image from multiple source files that are compiled separately. The compiler does not have complete visibility across all compilation units while compiling individual source file, therefore it misses out on many optimization opportunities which it would have had if the entire code have been a part of a single file.


Other interprocedural optimizations like multi-file compilation and Whole Program Optimization help address the lack of visibility across compilation units, it enables the compiler to perform cross-file inlining and removal of unused functions. However, LLVM is capable of performing idle and runtime optimizations along with whole program analysis and aggressive restructuring transformations. This infrastructure is leveraged by LTO to achieve higher levels of optimizations and we will see how this is done, in the implementation section of this article.


Design and Implementation


The key to implementing LTO is the generation of bitcode (also known as bytecode) files which are used to describe an intermediate representation of the source code. Bitcode contains more information about the source file(s) than an ELF object which enables the linker to generate a more optimized image.

When armclang is invoked with the -flto option, it generates bitcode files for each of the source files being compiled with this option and passes them to the linker. The linker then processes the bitcode files to emit an optimized ELF object which can be linked with the library objects.


When LTO is enabled, the compiler and linker perform the following steps:

  1. The compiler translates source code into an intermediate representation called bitcode. This also contains module dependency information.
  2. The linker processes these bitcode files along with other ELF object files and extracts the module dependency information from them before passing them to the link time optimizer (llvm-lto) utility.
  3. The dependency information of the modules allows the link time optimizer to retain all the necessary modules and remove the rest, therefore creating a highly optimized ELF object file.
  4. The link time optimized object file is linked with other ELF object files and pre-compiled libraries to generate the final executable image.


LTO re-drawn.png

    Figure 1: This block diagram is a visual representation of the steps involved in Link Time Optimization




This example is derived from the example code available on the LLVM website: Other relevant information about the build process is given below:

  • The compilation tools used to build this example is the ARM Compiler 6.3.
  • It was built on a 64-bit Windows platform (the results are platform independent).
  • The examples are targeted at the ARMv8-M architecture.


Consider the following C source files :

/* ------------ lto.c ------------ *//* ------------ foo.c ------------ */


int fn1(void);
void fn2(void);
int fn3(void);
#define VINT volatile int

VINT *msg_buffer = (VINT*)0x32000000;

void fn4(void) {
  *(msg_buffer++) = 0x000C0DE4;

int lto() {
      return fn1();
void fn4(void);
static signed int i = 0;

void fn2(void) {
  i = -1;

static int fn3(void) {  
  return 10;

int fn1(void) {
  int ret_val = 0;

  if (i < 0)
    ret_val = fn3();

  ret_val = ret_val + 64;
  return (ret_val);


The source code above can be represented using the following diagram:

    Figure 2: Expected Program Flow


By analysing the example code we can make the following observations:

  • Function fn2() is not referenced by any function in the source code.
  • Function fn3() calls fn4().
  • Function fn1() conditionally calls fn3().
  • Function lto() calls fn1().
  • fn3() is only called by fn1() if value of i<0.
  • Calling fn2() would be the only way to make the value of i<0.
  • Variables defined as a static so it can only be modified by functions within the same translation unit.
  • Because fn2() is never called, the condition under which fn3() is executed is never satisfied.
    • This means fn3() will never be called in fn1().
    • This implies fn4() will never be called as it is called by fn3().



Keeping this in mind we will use the example to compare code generated with and without LTO in the following ways:

  1. Without Link Time Optimization (using ARM Compiler 6.3)
  2. With selective Link Time Optimization (using ARM Compiler 6.3)
  3. With full Link Time Optimization (using ARM Compiler 6.3)
  4. With all available Inter-procedural optimizations in ARM Compiler 5


This will help us better understand the implementation and benefits of LTO in ARM Compiler 6.


Before we move ahead it would be beneficial for you to acquaint yourself to some commonly used optimization techniques and terminologies by reading the following knowledge article:



Compiling without Link Time Optimization


In this example LTO is not enabled for any of the source files. This means that the bitcode files are not generated by the compiler and no link lime optimizations are performed. The compiler directly generates object files that are linked by armlink to generate an executable image. It’s important to note that both source files in this case have been compiled with –O2 to keep the comparison as close as possible to the compilation with LTO enabled. When LTO is enabled the default optimization level selected is –O2.



Build Commands:

armclang --target=arm-arm-none-eabi -c foo.c -o foo.o -O2 -march=armv8-m.main 

armclang --target=arm-arm-none-eabi -c lto.c -o lto.o -O2 -march=armv8-m.main

armlink  --lto foo.o lto.o -o lto.axf --entry=lto --cpu=8-M.main

fromelf -cd lto.axf -o nolto_ac6.s


Generated Assembly code:


At –O2 the compiler performs the following optimizations:

  • Function foo3() has been inlined into its caller function foo1().
  • A Tail-call optimization applied to lto() for the call to foo1().


Compiling with selective Link Time Optimization


In this example one of the two files (foo.c) is compiled with LTO enabled. This means that the bitcode file is generated only for foo.c allowing the llvm-lto to apply the optimizations only on a part of the source code.



Build Commands:

armclang --target=armv7a-arm-none-eabi -flto -c foo.c -o foo.bc -march=armv8-m.main

armclang --target=armv7a-arm-none-eabi -c lto.c -o lto.o -O2 -march=armv8-m.main

armlink --lto foo.bc lto.o -o lto.axf --entry=lto --cpu=8-M.Main

fromelf -cd lto.axf -o lto_sel_ac6.s



Generated Assembly code:


Besides the optimizations enabled by compiling at optimization level –O2, enabling LTO in only foo.c leads the following additional optimizations:

  • The compiler removes fn2() as it is not called by any of the other functions in the source files.
  • The llvm-lto can determine that value of  i in fn1() will always be greater than 0 and removes the call to fn3().
  • This means that the value of ret_val1 is not modified by fn3() and the function fn1() can been reduced to just return the fixed value of 0x40 or 64.
  • The compiler removes fn3() but misses the optimization opportunity of removing fn4() as it is called by the removed function fn3(). This is because lto.c was not compiled with LTO enabled.


Compiling with full Link Time Optimization:


In this example all the input source files are compiled with LTO enabled.


Build Commands:

armclang --target=arm-arm-none-eabi -flto -c foo.c -o foo.bc -march=armv8-m.main

armclang --target=arm-arm-none-eabi -flto -c lto.c -o lto.bc -march=armv8-m.main

armlink --lto foo.bc lto.bc -o lto.axf --entry=lto --cpu=8-M.Main

fromelf -cd lto.axf -o lto_full_ac6.s


Generated Assembly Code:



Along with optimizations mentioned earlier (in the selective link time optimization section), ARM Compiler 6 is able to perform additional interprocedural optimizations when LTO is enabled for all source files:

  • The function fn1()is inlined into lto() even though fn1() is defined  in a different compilation unit.
  • Similarly the compiler can determine that since fn3() will not be called by fn1() it can remove the definition of fn4() (this was not possible earlier as  fn3() and fn4() are defined in different files).
  • This means the compiler can now reduce the entire source code into a single  lto() function resulting in an extremely small and efficient code as shown above.



Interprocedural optimizations using ARM Compiler 5


At this point it’s worth comparing the improvement in interprocedural optimizations in ARM Compiler 6 as compared to ARM Compiler 5.

The example below shows the code generated by using all the available interprocedural optimizations available in ARM Compiler 5.


Build Commands:

armcc -c -O3 -OSpace --split_sections --multifile --whole_program --feedback fb.txt --cpu=Cortex-M7 foo.c lto.c -o lto_mf.o

armlink lto_mf.o --list fbout.txt --feedback fb.txt -o lto_mf.axf --cpu=Cortex-M7 --entry=lto

armcc -c -O3 -OSpace --split_sections --multifile --whole_program --feedback fb.txt --cpu=Cortex-M7 foo.c lto.c -o lto_mf.o

armlink lto_mf.o --list fbout2.txt -o lto_mf.axf --cpu=Cortex-M7 --entry=lto

fromelf -cdv lto_mf.axf -o lto_AC5.s


The commands listed above need to be run twice. Once to generate the feedback file that contains function usage information. The second time to make use of the generated feedback file to remove the unused functions/sections based on the first compile.



Generated Assembly Code:



In this compilation the compiler has been able to perform only the following two optimizations:

  • Removing the unused function fn2().
  • Inlining fn3() into fn1().
  • Tail call optimization of call to function fn1().



LTO Current Restrictions and Limitations in ARM Compiler 6


As of now armclang in ARM Compiler 6 uses the  armlink linker as LLVM Clang doesn’t have its own integrated linker. LLVM clang has a different linker llvm-link for bitcode files and lld to link standard object file . Using armlink as the linker makes it easier to link objects built with ARM Compiler 5 and ARM Compiler 6 and also be able to leverage all the benefits that armclang brings. Currently there are a few limitations of how LTO can be used which will be overcome as the tool chain matures.

  • LTO cannot be performed on static libraries as armar or armclang cannot generate bitcode files for libraries.
  • Partial Linking is not supported with LTO as it only works with elf objects not bitcode files.
  • You might get linking errors if your library code calls a function that was defined in the source code but removed by the link time optimizer.
  • Scatter-loading of LTO objects is supported but it’s recommended for code and data that doesn’t have a strict placement requirement.
  • Bitcode objects are not guaranteed to be compatible across compiler versions. This means that you should ensure all your bitcode files are built using the same version of the compiler when linking with LTO.





Link Time Optimization is a very promising optimization technique that is achieved by having tighter integration between the ARM compiler and linker. It currently has a few limitations which will be overcome in the future, and even in its present state it is extremely powerful and can generate code that’s highly optimized for size, which can also improve performance. This example shows the maximum benefits in code size and performance that can be achieved with LTO. It is important to keep in mind the mileage you may get with LTO may vary based on the nature of the source code it is applied to.


I will try and publish a more comprehensive code size and performance comparison of using LTO with industry standard benchmarks in the future. In the meantime I strongly encourage you to experiment and use this at your end and if possible provide feedback based on your results.



We have just released DS-5 5.23 with significant enhancements to Streamline. In this blog, I will highlight the major changes in the latest version.  For a more detailed list of enhancements and fixes, please see the changelog.



In 5.23, we have added a new feature called templates. With templates, you can now create custom configuration of charts, save it on the disk as a template, and apply that configuration on any existing capture.This is best explained with an example. Here, I have created a Streamline capture with support for 3 charts - CPU Activity (User Activity and System Activity Counter), Clock (Frequency Counter) and Scheduler (Switch Counter). When I apply my custom templates, CPU_And_Clock (Only CPU Activity and Clock charts) and Only_CPU (CPU Activity only), the view changes according to the template.



Pre-configured Templates


Modern SoCs support complex performance counters that are not always easy to understand and use. To make it easy for Mali GPU users, we have included some pre-configured templates in 5.23 Streamline.  These templates include charts with information that is easy to understand. One such chart is Mali External Bandwidth, which plots more understandable number of external bus read bytes rather than underlying $MaliL2CacheExtReadsExternalReadBeats counter.




All the pre-configured templates included in the release can be seen in the below image.




Versatile Templates

Templates can be used in other useful ways.

  • Capture only the required counters. This is useful in debugging an issue that is isolated to one part of the system. For example, using a GPU template while debugging GPU performance, reduces overhead of capturing CPU counters.
  • Combine charts of two templates to see a joined up view. This is useful when debugging an issue that spans across multiple parts of the system. For example, for a problem that involves CPU and GPU, you can combine CPU and GPU specific templates to see the overall picture.
  • Create a template from one capture and use it on another. This is useful when analyzing multiple captures for a same problem. For example, if you are analyzing cache performance across different use-cases, you can create an cache-analysis template once, and use the same to analyze the captures for different use-cases.
  • Share the templates with others.  Templates can be a great mechanism to share knowledge. For example, an expert who understands the underlying counters, can create a template and share it with others, thus allowing non-experts to quickly get started.

Standalone application

Streamline is now a standalone application, independent of Eclipse for DS-5, making it easy to launch from the Start menu.  Note that you can continue to launch from within DS-5 using Show Views menu item.


Faster UI response

In 5.23, we have significantly improved UI response leading to faster zoom, quicker scroll among others. We undertook a major overhaul of the Streamline code allowing us to make it simpler and more responsive.



DS-5 v5.23 comes with an enhanced Streamline with new features like templates and an improved UI response. Streamline is now a standalone application and can be launched independent of DS-5. You can download the DS-5 5.23 version and explore the new features.

Last week at TechCon 2015 in Santa Clara (California), ARM announced a new architecture and a new A-class low-power processor:

  • ARMv8-M architecture: By offering security, enhanced scalability, and improved debug, the ARMv8-M architecture makes it easier for developers to meet the needs of next generation embedded devices. Read more here.
  • Cortex-A35 processor: the most efficient Cortex-A class CPU ever designed by ARM. The Cortex-A35 consumes about 33 percent less power per core and occupies 25 percent less silicon area, relative to the Cortex-A53. Read more here.


Today we are happy to introduce ARM Compiler 6.3, available now to download standalone or integrated in DS-5 5.23.

ARM Compiler is always at the leading edge for supporting new architectures and new cores so it should not come as a surprise that ARM Compiler 6.3 already supports both the new ARMv8-M architecture and the new Cortex-A35 processor.


Let’s explore some of the new features and improvements made in ARM Compiler 6.3.



Security is a fundamental aspect in the digital world and ARM is committed to make sure every ARM-based device is secure by default. With the introduction of the TrustZone technology for ARMv8-M, ARM has driven security to low power devices based on Cortex-M processors to ensure developers have a reliable and efficient way of protecting embedded or Internet of Things devices.

TrustZone splits the execution of code between Secure State and Non-Secure State: fine-grained control of memory access and special instructions allow secure code to be protected and, at the same time, to provide guarded entry-points from the Non-Secure state. TrustZone for ARMv8-M has been designed to maintain the small interrupt latency and complexity of the code to the minium, making an ideal technology even for the smallest microcontrollers.



ARM Compiler 6.3 already supports the new architecture with the necessary macros, intrinsics and keywords for simplyfing software development targeting TrustZone for ARMv8-M. ARM Infocenter is a great resource to get more information on how to make use of TrustZone for ARMv8-M. You can also find more information on this blog post Whitepaper - ARMv8-M Architecture Technical Overview.



When we started to work on ARM Compiler 6 we knew that, in order to be successful, we had to bring the performance of the compiler to very high standards. Leveraging the LLVM infrastructure is now paying dividends and the performance reached by ARM Compiler 6.3 are confirming our expectations:




The benchmarks show that we not only reached performance similar to ARM Compiler 5 but also the rapid pace we can get these performance improvements.


Where can I find ARM Compiler 6.3?

ARM Compiler 6.3 is available to download as a standalone product from Alternatively, ARM Compiler 6.3 is integrated in the latest release of DS-5 5.23 which can be downloaded here.


Did you evaluate DS-5 already and you don’t know how to get a license again? Claim your evaluation serial number here as explained by Michelle in this blog post.


Do have any questions? Feel free to reply to this blog post or send me an email. Any feedback is very welcome and it helps us to keep ARM Compiler 6 the best compiler for the ARM architecture.





The GNU ARM Eclipse project includes a set of open source Eclipse plug-ins and tools to create/build/debug/manage ARM (32-bits) and AArch64 (64-bits) applications and static/shared libraries, using the latest GNU ARM GCC toolchains.


ARM family and FPU type


Starting with GNU ARM Eclipse version 2.10.2, from November 2015, full Cortex-M7 support was added to the C/C++ BuildSettingsTool Settings page; it is now possible not only to select the ARM family: cortex-m7, but also to select the new specific FPU type:



The Hello World Cortex-M C/C++ Project wizard


The project wizard was updated to create generic Cortex-M7 projects.


The STM32F7xx C/C++ Project wizard


And last, but probably the most useful, a new template to create STM32F7 projects was added.


The wizard currently supports STM32F745xx, STM32F746xx, STM32F756xx, and can create blinky projects for the STM32F746_EVAL and STM32F746_DISCOVERY boards.


The created projects not only pass the build, but are ready to run on the selected boards.


More info


For more details about the GNU ARM Eclipse project, please refer to the project site

At TechCon, the mbed OS Technology Preview was announced publicly. My colleague Matthias Hertel has written an application note that explains how to import mbed OS projects to Keil MDK Version 5.


mbed OS uses yotta as a build tool which also downloads software components that the project depends on. Each yotta component of the mbed OS project is represented by a single MDK project. The complete mbed OS project is imported as multi-project workspace to give you seamless access to the entire code base of the mbed OS application.


For more information, check the application note 282.


Embedded Internet of Things

Heterogeneous Software Development

By Stephen Theobald

ARM-based platforms come in a variety of processor configurations, and these platforms now often have more than one ARM processor.  These multi-core platforms have usually been “Symmetric Multi-Processing” (SMP) systems, where a cluster of identical CPUs work together co-operatively with a common memory map.  More recently, heterogeneous Asymmetric Multi Processing (AMP) and AMP+SMP systems that have different CPUs with different profiles are becoming available now too.  An effective combination is ARM Cortex-A and Cortex-M family cores in a single package. The Cortex-M core offers low interrupt latency for good real time response, and with low power consumption.  Cortex-A cores offer higher performance but consume more power.  Having both classes of core in a single package enables the System Designer to partition a system optimally, for the best balance between low power and low latency versus heavy application workloads.  For example, an AMP+SMP system might have one or more cores running an OS such as Linux in SMP mode, and an additional core running an RTOS or bare-metal application.


Examples of AMP+SMP devices include Freescale’s i.MX7 Dual (2 x Cortex-A7 + Cortex-M4), Texas Instruments OMAP5432 (2 x Cortex-A15 + 2 x Cortex-M4), and Xilinx UltraScale MPSoC (4 x Cortex-A53 + 2 x Cortex-R5). AMP devices are also available, such as Freescale’s i.MX7 Solo (Cortex-A7 + Cortex-M4) and the Vybrid™-series such as VF6xx (Cortex-A5 + Cortex-M4). ARM’s own “Juno” development platform contains 2 x Cortex-A57 + 4 x Cortex-A53 cores, plus a Cortex-M3 System Control Processor for power control.


DS-5 allows you to compile code for both classes of core, and then debug them both together.  DS-5’s code development environment provides C/C++ compilers for ARMv7 and ARMv8 embedded code (both ARM Compiler 5 and the new ARM Compiler 6), and a Linaro GCC compiler for Linux applications, Linux kernel and kernel modules. 


Use ARM Compiler 5 to build your embedded/RTOS code for Cortex-M, -R or (32-bit) A-class devices.  ARM Compiler 5 is now TÜV SÜD certified and can be used for safety-related software development, together with the ARM Compiler Qualification KitARM Compiler 6 is the next-generation C/C++ compilation toolchain targeting embedded software development.  ARM Compiler 6 supports all the latest ARM processors, including 64-bit ARMv8.


DS-5 Debugger is able to debug both SMP and AMP system designs.  Linux-based targets can be debugged via gdbserver other Ethernet.  Bare-metal and RTOS targets can be debugged either by traditional JTAG-based debug hardware such as DSTREAM, or via CMSIS-DAP over USB.  DS-5 Debugger allows simultaneous connection to multiple cores, so for example, you can be debugging the Linux SMP kernel on the A-class cores and then switch seamlessly to debugging an RTOS on the M-class core.  The screenshot below shows simultaneous debugging of the Linux kernel booting on a dual 2 x Cortex-A7, and an RTOS on Cortex-M3. There the two disassembly views (bottom center), one showing the Linux kernel stopped at a breakpoint at “start_kernel”, and the other showing the RTOS sitting on a WFI (Thumb2) instruction.  The source code of the Linux kernel is shown (bottom left), and a trace of its instruction execution history (bottom right).


Cortex-A7x2 Kernel + Cortex-M3.png


The Compiler and Debugger within DS-5 support multiple cores and OSs well, meeting the needs of heterogeneous architectures today. ARM’s software development tools have come a long way within the past 25 years, and to celebrate ARM’s 25th birthday we’re giving the first 20,000 customers the chance to try the latest version of DS-5 again. Get your free serial number »


See these related topics:

25th_birthday_homepage_banner-latest.pngOn 27th November ARM turns 25. To celebrate, we’re giving the first 20,000 customers the chance to try the latest version of DS-

5 Ultimate Edition again. To see for yourself how far DS-5 has come, claim your serial number before 30th November.


Get your free serial number »


Why should I try DS-5 again?


  • DS-5 supports the latest ARM processors. In DS-5 Ultimate Edition, you’ll get everything you need for all ARM software development, including 64-bit ARMv8. This also packages the LLVM-based ARM Compiler 6 and ARMv8 FVP simulation model.
  • SoC bring-up is now easier than ever before when using the Platform Configuration Editor (PCE) in DS-5. The PCE will autodetect your underlying system architecture, saving you time and effort when bringing up a new SoC.
  • The ARM Compiler 5.04 is now TÜV SÜD certified and can be used for safety-related software development. The ARM Compiler Qualification Kit can also be used to provide evidence for justifying toolchain selection.
  • ARM Compiler 6 is the next-generation C/C++ compilation toolchain targeting embedded software development. ARM Compiler 6 supports all the latest ARM processors, including 64-bit ARMv8.


Get your free serial number »

CMSIS Version 4.5.0 Released


CMSIS 4.5.0 is now available from For detailed information about the changes refer to the revision history.


CMSIS-Driver Validation Suite Version 1.0 Released


A Software Pack for CMSIS-Driver Validation is available on The CMSIS-Driver validation tests and verifies the API interface, correct data communication using loopback modes, and the timing of the data communication. Refer to the CMSIS-Driver Validation User's Guide for more information.


CMSIS-Pack Management for Eclipse Version 1.0 Released


Our open source Eclipse Plug-ins of CMSIS-Pack management are feature complete. The release is available under Eclipse Public License. Download the source code from: (pre-built plug-ins


Visit us at the Eclipse Conference Europe 2015 in Ludwigsburg, Germany, 3-5 November 2015 to get detailed information in the session "Enhanced Project Management for Embedded C/C++ Programming using Software Components".

Hi, wanted to share with you a recent release on using the Cortex-M Prototyping System (MPS2) to provide a software development platform for ARM's 1st IOT subsystem for Cortex-M. We've taken the deliverables from the subsystem, including the Cortex-M3 processor and added extra peripherals like a user would do and implemented this in FPGA on our hardware platform. The platform is supported on mbed so it has all the drivers for the peripherals on the board so you can evaluate the subsystem. The picture below shows how it has been implemented.


Using the FPGA to protyping the IoT subsystem was quite interesting, you can develop drivers and prove them out connecting to real devices ahead of silicon, we also used the platform to test the boot flow, connect to external BLe radios and generate a number of demos. It's worth taking a look to see if using the IoT subsystem for Cortex-M could accelerate your IoT development.





The GNU ARM Eclipse project includes a set of open source Eclipse plug-ins and tools to create/build/debug/manage ARM (32-bit) and AArch64 (64-bit) applications and static/shared libraries, using the latest GNU ARM GCC toolchains.


New look


Starting with September 2015, the GNU ARM Eclipse web site has a completely new look:


Apart from the aspect (definitely cool!), the main functional change is the addition of the right sidebar, to facilitate access to the project documentation.


The new site no longer uses WordPress; instead, it is entirely static and was generated with Jekyll.


New project home on GitHub


With GitHub gaining more and more traction, the GNU ARM Eclipse project was migrated from SourceForge to GitHub.




The migration of repositories was easy, each project was pushed into its own repository.


The current project repositories are:



Binary files as Releases


The migration of binary files was a bit more complicated, and, due to current GitHub limitations, is incomplete. The main problem was raised by the two Eclipse update sites, which require a certain folder structure, and since GitHub currently does not support adding folders to releases, the Eclipse update sites will remain hosted on SourceForge (at


Except the Eclipse update sites, all future binary files will be published as GitHub Releases, attached to the respective project repositories.


The archive of past releases was also migrated from SourceForge to GitHub.


Issues trackers


The SourceForge trackers were replaced by the GitHub Issues trackers, one for each project.


It is planned to preserve the content of the old SourceForge trackers, even if now they are locked and new tickets cannot be created there.


Notifications via watched projects


For those interested in receiving notifications, the recommended way is to subscribe to the GitHub projects, by clicking the Watch button and selecting Watching).


In addition to the gnuarmeclipse/plug-ins project, it is also recommended to subscribe to the gnuarmeclipse/ project, to receive notifications for new Web posts.


More info


For more details about the GNU ARM Eclipse project, please refer to the project site

ARM Compiler 6 main focus has always been bare-metal applications running on ARM processors. Even though ARM Compiler doesn't officially support building Linux applications, because of the high compatibility between armclang and GCC, it's much easier now to build them. In this blog I will explain how to set up ARM Compiler 6 to build a Linux Hello World from scratch.


This tutorial covers the build and debug of a basic Hello World C program running on Linaro on a ARMv8 model using ARM DS-5 Development Studio. In particular, it shows:

  • Download and setup GCC
  • Write a simple “Hello World” application in ARM DS-5 Development Studio
  • Build the application using ARM Compiler 6
  • Set up a debug session in ARM DS-5 Development Studio
  • Run it on a model of an ARMv8 system

To complete this tutorial, you'll need DS-5 Ultimate Edition: Download the 30-day trial »

Included in DS-5 Professional is the ARMv8-A Fixed Virtual Platform (FVP) model, giving you a platform to develop code on in advance of hardware availability.

Download Linaro GCC and Linaro image

If you do not have Linux already running on ARMv8 you can download a ready-to-use Linaro image from Linaro website:
You need to download the kernel binary img.axf and the file system image vexpress64-openembedded_lamp-armv8-gcc-4.9_*.img.gz. (make sure you download the lamp image because the minimal image does not include gdbserver, necessary to debug the application from DS-5).


Even if it seems counterintuitive, it’s necessary to have GCC in order to build Linux application with ARM Compiler 6: the reason is that ARM Compiler 6 does not include Linux libraries so it needs to use glibc from GCC.


For our example, we will use the Linaro toolchain for Cortex-A which can be again downloaded from Linaro website

Download Linaro-toolchain-binaries 4.9 (Aarch64 little-endian) and save extract it locally.



Add the new toolchain to DS-5

DS-5 includes three default toolchains but it’s also possible to add new ones as explained by ronans in his blog post: Improved support for multiple ARM Compilers in DS-5 5.20 and beyond.

Open DS-5 settings by clicking on the menu Window and then Preferences. On the left hand side you can find a list of categories: select Toolchains under DS-5.


The list of available toolchains is shown in the list on the right hand side of the window. Proceed to add the downloaded GCC toolchain by clicking on the Add… button. Select the bin path of the toolchain you want to add and click on the Next > button.


DS-5 should be able to automatically detect the type of toolchain selected and other information like the version and the binaries. Click Finish if you want to complete the procedure and keep the default values (suggested). By clicking Next > you would be able to amend some of the information DS-5 already filled with values.


Create a new project

Create a new project in DS-5 by clicking on FileNewProject. Select C Project under C/C++ menu and click Next.


DS-5 shows the list of the available toolchains in the list. We need to give a name to the project, select the GCC toolchain we added in the previous section (make sure you select the aarch64 one and not the DS-5 built in) and click on the Finish button.


In order to use ARM Compiler 6 we need to change the project build settings to use armclang as a compiler and leave GCC for all the other tools. In particular, we want to make sure GCC linker is used instead of armlink.


Right click on the project and select Properties from the menu. In the C/C++ Build section we need to change the compiler in Tool Chain Editor. Click on Select tools and a window should appear with the list of all the available tools on the left hand side and the tools used for the project on the right hand side. What you need to do is just select ARM C Compiler 6 from the list on the left: DS-5 will automatically pick up the correspondent in the currently used tools (GCC C Compiler) and, by clicking on the << - Replace ->> button, we replace it with ARM Compiler 6.


The Select tools window should have now the following Used tools:


Once completed you can click OK and go in the Settings section of C/C++ Build.


In this section we need to configure armclang to compile for the ARMv8 target. Because armclang is not in the PATH if the project uses GCC, we need to specify in the Command textbox the full path as shown below (for example "C:\Program Files\DS-5\sw\ARMCompiler6.00u2\bin\armclang").


In the Target page it is necessary to specify aarch64-linux-gnu.


Add to Included Path the full path of the include directory in the ARM Compiler 6 directory (for example C:\Program Files\DS-5\sw\ARMCompiler6.00u2\include).


And finally we need to add few extra compiler options in the Miscellaneous section; specifically we need to indicate the root path of the GCC compiler with the option --gcc-toolchain and the path to the libc libraries with --sysroot. For example:


--gcc-toolchain="$PATH_TO_GCC_COMPILER$" --sysroot="$PATH_TO_GCC_COMPILER$\aarch64-linux-gnu\libc"


You can now press OK to save the new settings.


Building the project

Now that the project has been set up we need to write the code for the Hello World. Right click on the project and select NewSource File. Select a name for the new file and click Finish.

A new source editor window should open in DS-5 to edit the file. For this tutorial we will just add the following code:


int main() {
       printf("Hello v8 World!\n");
       return 0;

Save the file and build the project by selecting Build Project from the project menu.


The project should build without any errors. If not, check the output of the build in the Console tab and verify that all the settings have been correctly passed to the compiler/linker.


Start the ARMv8 model within DS-5

Our hello world application is ready but we still don’t have an environment where to test it. DS-5 Ultimate Edition includes multiple platform models of an ARMv8 processor we can use to boot Linux and debug our application on it. Again, let's take a look at ronans blog post for more details: Booting Linux on the ARMv8-A model provided with DS-5 Ultimate Edition.


We can start the model directly from DS-5 by creating a new DS-5 Debugger configuration in Debug Configurations. Create a new Debug configuration and select AEMv8x4 under the ARM RTSM list (typing AEMv8 in Filter platforms will help with the selection).


Paste the following parameters in the Model parameters text box:

-a “[LINARO_PATH]\\img.axf”
 --parameter motherboard.mmc.p_mmc_file="[LINARO_PATH]\\vexpress64-openembedded_lamp-armv8-gcc-4.9_20150123-708.img"
 --parameter motherboard.mmc.card_type=eMMC 
 --parameter motherboard.smsc_91c111.enabled=true
 --parameter motherboard.hostbridge.userNetworking=true
 --parameter motherboard.hostbridge.userNetPorts="5555=5555,8080=8080,22=22"


Where [LINARO_PATH] is the path where you saved the kernel image and the Linux image downloaded from the Linaro website previously. The last parameter userNetPorts is important later to allow the connection of the debugger to the gdbserver port opened on the model.


In the Debugger tab make sure the radio button Connect only is selected. You can now Apply the modifications and click on Debug to start the model.


Once loaded, press the Continue button (green arrow) to run the model and boot Linux.


Debug via gdbserver

Once Linux finished booting (it shows the command line prompt), it’s possible to access to the file system and processes running on the model through a Remote System connection in DS-5. To create a new connection, select the Remote Systems tab in the DS-5 Debug perspective. Click the new connection button as indicated in the image below:

image_10.pngSelect Linux as System type and press Next. The model is running locally so we can specify LOCALHOST as hostname. Give a name to the connection and an optional description. Finally click Finish to complete the creation of the connection.


The new connection should appear in the list and you should get access to files and processes. In case DS-5 asks for login details use root as username and leave empty as password (or the one you specified if you changed that in the Linaro image running in the ARMv8 model).


We have now access to the Linux system running on the model and you should be able to access to the file system and view the running processes from directly the Remote System view.


Now that we have an established a successful connection, we can create the debug configuration for our Hello World and run the application on the model.


Open the Debug Configurations dialog again and create a new connection this time selecting Linux Application Debug – Application Debug – Connection via AArch64 gdbserver – Download and debug application.


Make sure you set the port to 5555 as we specified in the list of parameters when launching the model.


Switch to the Files tab and select the binary built in the previous step. Set /home/root for both Target download directory and Target working directory. In the Debugger tab make sure the radio button Debug from symbol is selected with main as symbol.


If all the settings are correct, the Debug button should be enabled and you can start a debug session simply by clicking on it. The debugger will connect to the target, upload the binary and stop at the beginning of the main function as we specified. The Debug Control view should appear similar to the following:


Press the green arrow to Continue to run the program after the breakpoint in the main function. The application should terminate successfully and you should be able to see in the App Console tab the console output specified in the printf function “Hello v8 World!”.


Congratulations ! You’ve just built a Linux application with ARM Compiler 6 running on a ARMv8 model !


In summary, in this tutorial we used DS-5 to create a Linux application built via ARM Compiler 6 and we debugged the application on a ARMv8 Fixed Virtual Platform Fast Model. The advanced code generation technology available in ARM Compiler 6 can be used to build Linux applications running on the latest ARM IP.


Did you find this blog useful? Do you think this would be a valuable supported feature? We would like to hear from you so please don't hesitate to comment or send an email (stefano[dot]cadario[at]arm[dot]com) to discuss this!



We‘ve just released ARM DS-5 Development Studio v5.22 and we have made Streamline more powerful and user-friendly. In this blog, I will highlight the major changes in the latest version.  For a more detailed list of enhancements and fixes, please see the changelog.


Android trace events alongside extensive list of standard system events


Android supports trace events and these events are written to a system trace buffer. We can use Systrace tool, provided by Android, to collect and visualize these events. In DS-5 v5.22 release, we have enhanced Streamline to support Android trace events. We can now see the performance counters and charts like CPU and GPU activity alongside standard Android trace events.


Figure 1 Streamline showing Android trace events


For example, in the above capture, you can inspect the frame by looking at various Android Surfaceflinger events like onDraw and eglSwapBuffers.


Profile Mali-T400 Series GPUs without having kernel source


Streamline requires an agent called gator to be installed and running on the ARM Linux target. Gator can operate in two modes

(a) kernel space gator – using a kernel module called gator.ko.
(b) user space gator – without the kernel module.

As user space gator is restricted to using user space APIs, it does not support all the features that kernel space gator supports. However user space gator is more easy to use as you do not need the target’s Linux kernel source to build the kernel module. Given the ease of use, we are working towards enhancing the features supported by user space gator. With this release, we are happy to announce that user space gator now supports Mali-T400 series of GPUs.  Note that you will need a recent version of Mali DDK, which exports system events to the user space. Going forward, you can expect us to add support for more Mali graphics processors.


Automatic fetch of symbol and other information from files on the target


Streamline needs symbol information to co-relate the events captured and the code being run. In the past, we had to manually provide this image information. This can be tricky if image is available only on the target but not on the host. In the v5.22 release, we have introduced automatic image transfer from the target feature to handle this situation.


Figure 2 New textbox to select processes for automatically fetching of image from the target


This is best shown with an example. In my case, I want to run the dhrystone executable on my Nexus 9 and see the function profile. As a first step, I run the program via adb, and start the Streamline session. During the session, I can now see a new box at the bottom, as seen in the above picture. Here, I can type a pattern (“dhr” in my case) to select the list of processes. Streamline will automatically fetch symbol information for these selected processes from the target. In my case, I can see that Streamline shows function profile for dhrystone, as seen in the below picture, without having to provide image manually.


Figure 3 Streamline showing function profile for the dhrystone process




Streamline snippet during the live capture


Streamline snippet is now available during live capture. As you might recall, Streamline snippet is a powerful feature where users can track complex counters, derived from a combination of more basic counters. For example, as seen in the below picture, you can track ClockPerInstruction (CPI) using $ClockCycles and $InstructionExecuted counters.


Figure 4 CPI snippet




DS-5 v5.22 comes with an enhanced Streamline with useful features like support for Android trace events, automatic symbol loading from target, profiling with user-space gator library for Mali-T400 series GPUs amongst others.  You can get all these features and more by downloading DS-5 v5.22 from hereSign up to the DS-5 newsletter and get updates, blogs and tutorials delivered to your inbox.

Filter Blog

By date:
By tag:

More Like This