Skip navigation


1 2 3 Previous Next

Software Development Tools

140 posts

We have just released DS-5 5.23 with significant enhancements to Streamline. In this blog, I will highlight the major changes in the latest version.  For a more detailed list of enhancements and fixes, please see the changelog.



In 5.23, we have added a new feature called templates. With templates, you can now create custom configuration of charts, save it on the disk as a template, and apply that configuration on any existing capture.This is best explained with an example. Here, I have created a Streamline capture with support for 3 charts - CPU Activity (User Activity and System Activity Counter), Clock (Frequency Counter) and Scheduler (Switch Counter). When I apply my custom templates, CPU_And_Clock (Only CPU Activity and Clock charts) and Only_CPU (CPU Activity only), the view changes according to the template.



Pre-configured Templates


Modern SoCs support complex performance counters that are not always easy to understand and use. To make it easy for Mali GPU users, we have included some pre-configured templates in 5.23 Streamline.  These templates include charts with information that is easy to understand. One such chart is Mali External Bandwidth, which plots more understandable number of external bus read bytes rather than underlying $MaliL2CacheExtReadsExternalReadBeats counter.




All the pre-configured templates included in the release can be seen in the below image.




Versatile Templates

Templates can be used in other useful ways.

  • Capture only the required counters. This is useful in debugging an issue that is isolated to one part of the system. For example, using a GPU template while debugging GPU performance, reduces overhead of capturing CPU counters.
  • Combine charts of two templates to see a joined up view. This is useful when debugging an issue that spans across multiple parts of the system. For example, for a problem that involves CPU and GPU, you can combine CPU and GPU specific templates to see the overall picture.
  • Create a template from one capture and use it on another. This is useful when analyzing multiple captures for a same problem. For example, if you are analyzing cache performance across different use-cases, you can create an cache-analysis template once, and use the same to analyze the captures for different use-cases.
  • Share the templates with others.  Templates can be a great mechanism to share knowledge. For example, an expert who understands the underlying counters, can create a template and share it with others, thus allowing non-experts to quickly get started.

Standalone application

Streamline is now a standalone application, independent of Eclipse for DS-5, making it easy to launch from the Start menu.  Note that you can continue to launch from within DS-5 using Show Views menu item.


Faster UI response

In 5.23, we have significantly improved UI response leading to faster zoom, quicker scroll among others. We undertook a major overhaul of the Streamline code allowing us to make it simpler and more responsive.



DS-5 v5.23 comes with an enhanced Streamline with new features like templates and an improved UI response. Streamline is now a standalone application and can be launched independent of DS-5. You can download the DS-5 5.23 version and explore the new features.

Last week at TechCon 2015 in Santa Clara (California), ARM announced a new architecture and a new A-class low-power processor:

  • ARMv8-M architecture: By offering security, enhanced scalability, and improved debug, the ARMv8-M architecture makes it easier for developers to meet the needs of next generation embedded devices. Read more here.
  • Cortex-A35 processor: the most efficient Cortex-A class CPU ever designed by ARM. The Cortex-A35 consumes about 33 percent less power per core and occupies 25 percent less silicon area, relative to the Cortex-A53. Read more here.


Today we are happy to introduce ARM Compiler 6.3, available now to download standalone or integrated in DS-5 5.23.

ARM Compiler is always at the leading edge for supporting new architectures and new cores so it should not come as a surprise that ARM Compiler 6.3 already supports both the new ARMv8-M architecture and the new Cortex-A35 processor.


Let’s explore some of the new features and improvements made in ARM Compiler 6.3.



Security is a fundamental aspect in the digital world and ARM is committed to make sure every ARM-based device is secure by default. With the introduction of the TrustZone technology for ARMv8-M, ARM has driven security to low power devices based on Cortex-M processors to ensure developers have a reliable and efficient way of protecting embedded or Internet of Things devices.

TrustZone splits the execution of code between Secure State and Non-Secure State: fine-grained control of memory access and special instructions allow secure code to be protected and, at the same time, to provide guarded entry-points from the Non-Secure state. TrustZone for ARMv8-M has been designed to maintain the small interrupt latency and complexity of the code to the minium, making an ideal technology even for the smallest microcontrollers.



ARM Compiler 6.3 already supports the new architecture with the necessary macros, intrinsics and keywords for simplyfing software development targeting TrustZone for ARMv8-M. ARM Infocenter is a great resource to get more information on how to make use of TrustZone for ARMv8-M. You can also find more information on this blog post Whitepaper - ARMv8-M Architecture Technical Overview.



When we started to work on ARM Compiler 6 we knew that, in order to be successful, we had to bring the performance of the compiler to very high standards. Leveraging the LLVM infrastructure is now paying dividends and the performance reached by ARM Compiler 6.3 are confirming our expectations:




The benchmarks show that we not only reached performance similar to ARM Compiler 5 but also the rapid pace we can get these performance improvements.


Where can I find ARM Compiler 6.3?

ARM Compiler 6.3 is available to download as a standalone product from Alternatively, ARM Compiler 6.3 is integrated in the latest release of DS-5 5.23 which can be downloaded here.


Did you evaluate DS-5 already and you don’t know how to get a license again? Claim your evaluation serial number here as explained by Michelle in this blog post.


Do have any questions? Feel free to reply to this blog post or send me an email. Any feedback is very welcome and it helps us to keep ARM Compiler 6 the best compiler for the ARM architecture.





The GNU ARM Eclipse project includes a set of open source Eclipse plug-ins and tools to create/build/debug/manage ARM (32-bits) and AArch64 (64-bits) applications and static/shared libraries, using the latest GNU ARM GCC toolchains.


ARM family and FPU type


Starting with GNU ARM Eclipse version 2.10.2, from November 2015, full Cortex-M7 support was added to the C/C++ BuildSettingsTool Settings page; it is now possible not only to select the ARM family: cortex-m7, but also to select the new specific FPU type:



The Hello World Cortex-M C/C++ Project wizard


The project wizard was updated to create generic Cortex-M7 projects.


The STM32F7xx C/C++ Project wizard


And last, but probably the most useful, a new template to create STM32F7 projects was added.


The wizard currently supports STM32F745xx, STM32F746xx, STM32F756xx, and can create blinky projects for the STM32F746_EVAL and STM32F746_DISCOVERY boards.


The created projects not only pass the build, but are ready to run on the selected boards.


More info


For more details about the GNU ARM Eclipse project, please refer to the project site

At TechCon, the mbed OS Technology Preview was announced publicly. My colleague Matthias Hertel has written an application note that explains how to import mbed OS projects to Keil MDK Version 5.


mbed OS uses yotta as a build tool which also downloads software components that the project depends on. Each yotta component of the mbed OS project is represented by a single MDK project. The complete mbed OS project is imported as multi-project workspace to give you seamless access to the entire code base of the mbed OS application.


For more information, check the application note 282.


Embedded Internet of Things

Heterogeneous Software Development

By Stephen Theobald

ARM-based platforms come in a variety of processor configurations, and these platforms now often have more than one ARM processor.  These multi-core platforms have usually been “Symmetric Multi-Processing” (SMP) systems, where a cluster of identical CPUs work together co-operatively with a common memory map.  More recently, heterogeneous Asymmetric Multi Processing (AMP) and AMP+SMP systems that have different CPUs with different profiles are becoming available now too.  An effective combination is ARM Cortex-A and Cortex-M family cores in a single package. The Cortex-M core offers low interrupt latency for good real time response, and with low power consumption.  Cortex-A cores offer higher performance but consume more power.  Having both classes of core in a single package enables the System Designer to partition a system optimally, for the best balance between low power and low latency versus heavy application workloads.  For example, an AMP+SMP system might have one or more cores running an OS such as Linux in SMP mode, and an additional core running an RTOS or bare-metal application.


Examples of AMP+SMP devices include Freescale’s i.MX7 Dual (2 x Cortex-A7 + Cortex-M4), Texas Instruments OMAP5432 (2 x Cortex-A15 + 2 x Cortex-M4), and Xilinx UltraScale MPSoC (4 x Cortex-A53 + 2 x Cortex-R5). AMP devices are also available, such as Freescale’s i.MX7 Solo (Cortex-A7 + Cortex-M4) and the Vybrid™-series such as VF6xx (Cortex-A5 + Cortex-M4). ARM’s own “Juno” development platform contains 2 x Cortex-A57 + 4 x Cortex-A53 cores, plus a Cortex-M3 System Control Processor for power control.


DS-5 allows you to compile code for both classes of core, and then debug them both together.  DS-5’s code development environment provides C/C++ compilers for ARMv7 and ARMv8 embedded code (both ARM Compiler 5 and the new ARM Compiler 6), and a Linaro GCC compiler for Linux applications, Linux kernel and kernel modules. 


Use ARM Compiler 5 to build your embedded/RTOS code for Cortex-M, -R or (32-bit) A-class devices.  ARM Compiler 5 is now TÜV SÜD certified and can be used for safety-related software development, together with the ARM Compiler Qualification KitARM Compiler 6 is the next-generation C/C++ compilation toolchain targeting embedded software development.  ARM Compiler 6 supports all the latest ARM processors, including 64-bit ARMv8.


DS-5 Debugger is able to debug both SMP and AMP system designs.  Linux-based targets can be debugged via gdbserver other Ethernet.  Bare-metal and RTOS targets can be debugged either by traditional JTAG-based debug hardware such as DSTREAM, or via CMSIS-DAP over USB.  DS-5 Debugger allows simultaneous connection to multiple cores, so for example, you can be debugging the Linux SMP kernel on the A-class cores and then switch seamlessly to debugging an RTOS on the M-class core.  The screenshot below shows simultaneous debugging of the Linux kernel booting on a dual 2 x Cortex-A7, and an RTOS on Cortex-M3. There the two disassembly views (bottom center), one showing the Linux kernel stopped at a breakpoint at “start_kernel”, and the other showing the RTOS sitting on a WFI (Thumb2) instruction.  The source code of the Linux kernel is shown (bottom left), and a trace of its instruction execution history (bottom right).


Cortex-A7x2 Kernel + Cortex-M3.png


The Compiler and Debugger within DS-5 support multiple cores and OSs well, meeting the needs of heterogeneous architectures today. ARM’s software development tools have come a long way within the past 25 years, and to celebrate ARM’s 25th birthday we’re giving the first 20,000 customers the chance to try the latest version of DS-5 again. Get your free serial number »


See these related topics:

25th_birthday_homepage_banner-latest.pngOn 27th November ARM turns 25. To celebrate, we’re giving the first 20,000 customers the chance to try the latest version of DS-

5 Ultimate Edition again. To see for yourself how far DS-5 has come, claim your serial number before 30th November.


Get your free serial number »


Why should I try DS-5 again?


  • DS-5 supports the latest ARM processors. In DS-5 Ultimate Edition, you’ll get everything you need for all ARM software development, including 64-bit ARMv8. This also packages the LLVM-based ARM Compiler 6 and ARMv8 FVP simulation model.
  • SoC bring-up is now easier than ever before when using the Platform Configuration Editor (PCE) in DS-5. The PCE will autodetect your underlying system architecture, saving you time and effort when bringing up a new SoC.
  • The ARM Compiler 5.04 is now TÜV SÜD certified and can be used for safety-related software development. The ARM Compiler Qualification Kit can also be used to provide evidence for justifying toolchain selection.
  • ARM Compiler 6 is the next-generation C/C++ compilation toolchain targeting embedded software development. ARM Compiler 6 supports all the latest ARM processors, including 64-bit ARMv8.


Get your free serial number »

CMSIS Version 4.5.0 Released


CMSIS 4.5.0 is now available from For detailed information about the changes refer to the revision history.


CMSIS-Driver Validation Suite Version 1.0 Released


A Software Pack for CMSIS-Driver Validation is available on The CMSIS-Driver validation tests and verifies the API interface, correct data communication using loopback modes, and the timing of the data communication. Refer to the CMSIS-Driver Validation User's Guide for more information.


CMSIS-Pack Management for Eclipse Version 1.0 Released


Our open source Eclipse Plug-ins of CMSIS-Pack management are feature complete. The release is available under Eclipse Public License. Download the source code from: (pre-built plug-ins


Visit us at the Eclipse Conference Europe 2015 in Ludwigsburg, Germany, 3-5 November 2015 to get detailed information in the session "Enhanced Project Management for Embedded C/C++ Programming using Software Components".

Hi, wanted to share with you a recent release on using the Cortex-M Prototyping System (MPS2) to provide a software development platform for ARM's 1st IOT subsystem for Cortex-M. We've taken the deliverables from the subsystem, including the Cortex-M3 processor and added extra peripherals like a user would do and implemented this in FPGA on our hardware platform. The platform is supported on mbed so it has all the drivers for the peripherals on the board so you can evaluate the subsystem. The picture below shows how it has been implemented.


Using the FPGA to protyping the IoT subsystem was quite interesting, you can develop drivers and prove them out connecting to real devices ahead of silicon, we also used the platform to test the boot flow, connect to external BLe radios and generate a number of demos. It's worth taking a look to see if using the IoT subsystem for Cortex-M could accelerate your IoT development.





The GNU ARM Eclipse project includes a set of open source Eclipse plug-ins and tools to create/build/debug/manage ARM (32-bit) and AArch64 (64-bit) applications and static/shared libraries, using the latest GNU ARM GCC toolchains.


New look


Starting with September 2015, the GNU ARM Eclipse web site has a completely new look:


Apart from the aspect (definitely cool!), the main functional change is the addition of the right sidebar, to facilitate access to the project documentation.


The new site no longer uses WordPress; instead, it is entirely static and was generated with Jekyll.


New project home on GitHub


With GitHub gaining more and more traction, the GNU ARM Eclipse project was migrated from SourceForge to GitHub.




The migration of repositories was easy, each project was pushed into its own repository.


The current project repositories are:



Binary files as Releases


The migration of binary files was a bit more complicated, and, due to current GitHub limitations, is incomplete. The main problem was raised by the two Eclipse update sites, which require a certain folder structure, and since GitHub currently does not support adding folders to releases, the Eclipse update sites will remain hosted on SourceForge (at


Except the Eclipse update sites, all future binary files will be published as GitHub Releases, attached to the respective project repositories.


The archive of past releases was also migrated from SourceForge to GitHub.


Issues trackers


The SourceForge trackers were replaced by the GitHub Issues trackers, one for each project.


It is planned to preserve the content of the old SourceForge trackers, even if now they are locked and new tickets cannot be created there.


Notifications via watched projects


For those interested in receiving notifications, the recommended way is to subscribe to the GitHub projects, by clicking the Watch button and selecting Watching).


In addition to the gnuarmeclipse/plug-ins project, it is also recommended to subscribe to the gnuarmeclipse/ project, to receive notifications for new Web posts.


More info


For more details about the GNU ARM Eclipse project, please refer to the project site

ARM Compiler 6 main focus has always been bare-metal applications running on ARM processors. Even though ARM Compiler doesn't officially support building Linux applications, because of the high compatibility between armclang and GCC, it's much easier now to build them. In this blog I will explain how to set up ARM Compiler 6 to build a Linux Hello World from scratch.


This tutorial covers the build and debug of a basic Hello World C program running on Linaro on a ARMv8 model using ARM DS-5 Development Studio. In particular, it shows:

  • Download and setup GCC
  • Write a simple “Hello World” application in ARM DS-5 Development Studio
  • Build the application using ARM Compiler 6
  • Set up a debug session in ARM DS-5 Development Studio
  • Run it on a model of an ARMv8 system

To complete this tutorial, you'll need DS-5 Ultimate Edition: Download the 30-day trial »

Included in DS-5 Professional is the ARMv8-A Fixed Virtual Platform (FVP) model, giving you a platform to develop code on in advance of hardware availability.

Download Linaro GCC and Linaro image

If you do not have Linux already running on ARMv8 you can download a ready-to-use Linaro image from Linaro website:
You need to download the kernel binary img.axf and the file system image vexpress64-openembedded_lamp-armv8-gcc-4.9_*.img.gz. (make sure you download the lamp image because the minimal image does not include gdbserver, necessary to debug the application from DS-5).


Even if it seems counterintuitive, it’s necessary to have GCC in order to build Linux application with ARM Compiler 6: the reason is that ARM Compiler 6 does not include Linux libraries so it needs to use glibc from GCC.


For our example, we will use the Linaro toolchain for Cortex-A which can be again downloaded from Linaro website

Download Linaro-toolchain-binaries 4.9 (Aarch64 little-endian) and save extract it locally.



Add the new toolchain to DS-5

DS-5 includes three default toolchains but it’s also possible to add new ones as explained by ronans in his blog post: Improved support for multiple ARM Compilers in DS-5 5.20 and beyond.

Open DS-5 settings by clicking on the menu Window and then Preferences. On the left hand side you can find a list of categories: select Toolchains under DS-5.


The list of available toolchains is shown in the list on the right hand side of the window. Proceed to add the downloaded GCC toolchain by clicking on the Add… button. Select the bin path of the toolchain you want to add and click on the Next > button.


DS-5 should be able to automatically detect the type of toolchain selected and other information like the version and the binaries. Click Finish if you want to complete the procedure and keep the default values (suggested). By clicking Next > you would be able to amend some of the information DS-5 already filled with values.


Create a new project

Create a new project in DS-5 by clicking on FileNewProject. Select C Project under C/C++ menu and click Next.


DS-5 shows the list of the available toolchains in the list. We need to give a name to the project, select the GCC toolchain we added in the previous section (make sure you select the aarch64 one and not the DS-5 built in) and click on the Finish button.


In order to use ARM Compiler 6 we need to change the project build settings to use armclang as a compiler and leave GCC for all the other tools. In particular, we want to make sure GCC linker is used instead of armlink.


Right click on the project and select Properties from the menu. In the C/C++ Build section we need to change the compiler in Tool Chain Editor. Click on Select tools and a window should appear with the list of all the available tools on the left hand side and the tools used for the project on the right hand side. What you need to do is just select ARM C Compiler 6 from the list on the left: DS-5 will automatically pick up the correspondent in the currently used tools (GCC C Compiler) and, by clicking on the << - Replace ->> button, we replace it with ARM Compiler 6.


The Select tools window should have now the following Used tools:


Once completed you can click OK and go in the Settings section of C/C++ Build.


In this section we need to configure armclang to compile for the ARMv8 target. Because armclang is not in the PATH if the project uses GCC, we need to specify in the Command textbox the full path as shown below (for example "C:\Program Files\DS-5\sw\ARMCompiler6.00u2\bin\armclang").


In the Target page it is necessary to specify aarch64-linux-gnu.


Add to Included Path the full path of the include directory in the ARM Compiler 6 directory (for example C:\Program Files\DS-5\sw\ARMCompiler6.00u2\include).


And finally we need to add few extra compiler options in the Miscellaneous section; specifically we need to indicate the root path of the GCC compiler with the option --gcc-toolchain and the path to the libc libraries with --sysroot. For example:


--gcc-toolchain="$PATH_TO_GCC_COMPILER$" --sysroot="$PATH_TO_GCC_COMPILER$\aarch64-linux-gnu\libc"


You can now press OK to save the new settings.


Building the project

Now that the project has been set up we need to write the code for the Hello World. Right click on the project and select NewSource File. Select a name for the new file and click Finish.

A new source editor window should open in DS-5 to edit the file. For this tutorial we will just add the following code:


int main() {
       printf("Hello v8 World!\n");
       return 0;

Save the file and build the project by selecting Build Project from the project menu.


The project should build without any errors. If not, check the output of the build in the Console tab and verify that all the settings have been correctly passed to the compiler/linker.


Start the ARMv8 model within DS-5

Our hello world application is ready but we still don’t have an environment where to test it. DS-5 Ultimate Edition includes multiple platform models of an ARMv8 processor we can use to boot Linux and debug our application on it. Again, let's take a look at ronans blog post for more details: Booting Linux on the ARMv8-A model provided with DS-5 Ultimate Edition.


We can start the model directly from DS-5 by creating a new DS-5 Debugger configuration in Debug Configurations. Create a new Debug configuration and select AEMv8x4 under the ARM RTSM list (typing AEMv8 in Filter platforms will help with the selection).


Paste the following parameters in the Model parameters text box:

-a “[LINARO_PATH]\\img.axf”
 --parameter motherboard.mmc.p_mmc_file="[LINARO_PATH]\\vexpress64-openembedded_lamp-armv8-gcc-4.9_20150123-708.img"
 --parameter motherboard.mmc.card_type=eMMC 
 --parameter motherboard.smsc_91c111.enabled=true
 --parameter motherboard.hostbridge.userNetworking=true
 --parameter motherboard.hostbridge.userNetPorts="5555=5555,8080=8080,22=22"


Where [LINARO_PATH] is the path where you saved the kernel image and the Linux image downloaded from the Linaro website previously. The last parameter userNetPorts is important later to allow the connection of the debugger to the gdbserver port opened on the model.


In the Debugger tab make sure the radio button Connect only is selected. You can now Apply the modifications and click on Debug to start the model.


Once loaded, press the Continue button (green arrow) to run the model and boot Linux.


Debug via gdbserver

Once Linux finished booting (it shows the command line prompt), it’s possible to access to the file system and processes running on the model through a Remote System connection in DS-5. To create a new connection, select the Remote Systems tab in the DS-5 Debug perspective. Click the new connection button as indicated in the image below:

image_10.pngSelect Linux as System type and press Next. The model is running locally so we can specify LOCALHOST as hostname. Give a name to the connection and an optional description. Finally click Finish to complete the creation of the connection.


The new connection should appear in the list and you should get access to files and processes. In case DS-5 asks for login details use root as username and leave empty as password (or the one you specified if you changed that in the Linaro image running in the ARMv8 model).


We have now access to the Linux system running on the model and you should be able to access to the file system and view the running processes from directly the Remote System view.


Now that we have an established a successful connection, we can create the debug configuration for our Hello World and run the application on the model.


Open the Debug Configurations dialog again and create a new connection this time selecting Linux Application Debug – Application Debug – Connection via AArch64 gdbserver – Download and debug application.


Make sure you set the port to 5555 as we specified in the list of parameters when launching the model.


Switch to the Files tab and select the binary built in the previous step. Set /home/root for both Target download directory and Target working directory. In the Debugger tab make sure the radio button Debug from symbol is selected with main as symbol.


If all the settings are correct, the Debug button should be enabled and you can start a debug session simply by clicking on it. The debugger will connect to the target, upload the binary and stop at the beginning of the main function as we specified. The Debug Control view should appear similar to the following:


Press the green arrow to Continue to run the program after the breakpoint in the main function. The application should terminate successfully and you should be able to see in the App Console tab the console output specified in the printf function “Hello v8 World!”.


Congratulations ! You’ve just built a Linux application with ARM Compiler 6 running on a ARMv8 model !


In summary, in this tutorial we used DS-5 to create a Linux application built via ARM Compiler 6 and we debugged the application on a ARMv8 Fixed Virtual Platform Fast Model. The advanced code generation technology available in ARM Compiler 6 can be used to build Linux applications running on the latest ARM IP.


Did you find this blog useful? Do you think this would be a valuable supported feature? We would like to hear from you so please don't hesitate to comment or send an email (stefano[dot]cadario[at]arm[dot]com) to discuss this!



We‘ve just released ARM DS-5 Development Studio v5.22 and we have made Streamline more powerful and user-friendly. In this blog, I will highlight the major changes in the latest version.  For a more detailed list of enhancements and fixes, please see the changelog.


Android trace events alongside extensive list of standard system events


Android supports trace events and these events are written to a system trace buffer. We can use Systrace tool, provided by Android, to collect and visualize these events. In DS-5 v5.22 release, we have enhanced Streamline to support Android trace events. We can now see the performance counters and charts like CPU and GPU activity alongside standard Android trace events.


Figure 1 Streamline showing Android trace events


For example, in the above capture, you can inspect the frame by looking at various Android Surfaceflinger events like onDraw and eglSwapBuffers.


Profile Mali-T400 Series GPUs without having kernel source


Streamline requires an agent called gator to be installed and running on the ARM Linux target. Gator can operate in two modes

(a) kernel space gator – using a kernel module called gator.ko.
(b) user space gator – without the kernel module.

As user space gator is restricted to using user space APIs, it does not support all the features that kernel space gator supports. However user space gator is more easy to use as you do not need the target’s Linux kernel source to build the kernel module. Given the ease of use, we are working towards enhancing the features supported by user space gator. With this release, we are happy to announce that user space gator now supports Mali-T400 series of GPUs.  Note that you will need a recent version of Mali DDK, which exports system events to the user space. Going forward, you can expect us to add support for more Mali graphics processors.


Automatic fetch of symbol and other information from files on the target


Streamline needs symbol information to co-relate the events captured and the code being run. In the past, we had to manually provide this image information. This can be tricky if image is available only on the target but not on the host. In the v5.22 release, we have introduced automatic image transfer from the target feature to handle this situation.


Figure 2 New textbox to select processes for automatically fetching of image from the target


This is best shown with an example. In my case, I want to run the dhrystone executable on my Nexus 9 and see the function profile. As a first step, I run the program via adb, and start the Streamline session. During the session, I can now see a new box at the bottom, as seen in the above picture. Here, I can type a pattern (“dhr” in my case) to select the list of processes. Streamline will automatically fetch symbol information for these selected processes from the target. In my case, I can see that Streamline shows function profile for dhrystone, as seen in the below picture, without having to provide image manually.


Figure 3 Streamline showing function profile for the dhrystone process




Streamline snippet during the live capture


Streamline snippet is now available during live capture. As you might recall, Streamline snippet is a powerful feature where users can track complex counters, derived from a combination of more basic counters. For example, as seen in the below picture, you can track ClockPerInstruction (CPI) using $ClockCycles and $InstructionExecuted counters.


Figure 4 CPI snippet




DS-5 v5.22 comes with an enhanced Streamline with useful features like support for Android trace events, automatic symbol loading from target, profiling with user-space gator library for Mali-T400 series GPUs amongst others.  You can get all these features and more by downloading DS-5 v5.22 from hereSign up to the DS-5 newsletter and get updates, blogs and tutorials delivered to your inbox.

Several ARM partners such as Clarinox, Micrium, Oryx-Embedded, wolfSSL and YOGITECH  are using Software Packs to deliver middleware. This simplifies installation, usage, and project maintenance of software components. We have created a new Partner Pack website that gives you an overview over the currently available Packs, covering a wide range of use cases:

  • Functional safety
  • Real-time operating systems
  • Security/encryption
  • TCP/IP networking and
  • Wireless stacks

Use Pack Installer to install one of these Packs automatically in µVision:


You may know that there is a team at the University of Szeged who are keen to make the web forward especially on embedded systems. Several months ago an interesting question was raised to us which sounded simple, but was hard to answer right away. The question was: How can one build a functional web browser?


If you are interested, check out my colleague's post at our blog site.

Any comments, feedback or even contributions are welcome! (Comments can be left either here or on our blog.)

It seems that just yesterday we released ARM Compiler 6.01 and it’s already time for a new major release of the most advanced compiler from ARM.

Let’s see the major highlights for this release:

  • Update of C++ libraries
  • Performance improvements
  • Enhanced support for ARMv7-M, ARMv6-M cores


Update of C++ libraries

Previous versions of ARM Compiler included only the Rogue Wave C++ libraries, which haven’t been updated from the C++03 standard. In ARM Compiler 6.02, we are moving closer to the leading edge by incorporating libc++ from the LLVM project, having passed our extensive internal validation suites.

The new libraries support the C++11 and C++14 standards and, in conjunction with the LLVM clang front-end, ARM Compiler 6.02 is the most modern and advanced toolchain to develop software for your ARM-based device. Look at some of the advantages of the new C++ standards in my recent blog post on C++11/14 features.

If you want to use the old libraries you can still do it by using the --stdlib=legacy_cpplib command line option.

Performance improvements

Performance is an important aspect of a toolchain and benchmarks are a convenient way (although not perfect) to evaluate the quality of the optimizations performed by the compiler.

During the last months, ARM engineers worked on identifying and implementing optimization opportunities in the LLVM backend for ARM. The results are shown in the following graph.


As you can see, the improvements between ARM Compiler 6.01 and ARM Compiler 6.02 are significant and show we are working on the right direction. Even though your code base is different from a synthetic benchmark, you may also see a boost in your code base as well: let's give it a try!

Enhanced support for ARMv7-M and ARMv6-M

clang is often used to build high performance code Cortex-A cores and it plays a fundamental role in this area. Embedded ARM microcontrollers have been less of a focus for the LLVM community and ARM is now filling the gaps by making ARM Compiler 6 a toolchain able to build efficient code across all range of ARM processors, from the smallest Cortex-M0+ to the latest Cortex-A72 64-bit processor.

ARM engineers have focused on Cortex-M processors and we are now confident enough to change the support level for Cortex-M family cores from alpha to beta: this means that the code generated for the ARMv7-M and ARMv6-M architectures has reached a good quality level and has been sufficiently tested by ARM (but still work to do hence the beta support moniker). We expect to complete support for ARMv7-M and ARMv6-M in the next release of ARM Compiler at the end of this year.

If you want to know all the changes in this release of the compiler you can take a look at the release notes on ARM infocenter.

This version of the compiler will be included in the next version of DS-5 (5.22) but if you can’t wait, you can get the standalone version from and add it to DS-5 (if you have DS-5.20 or greater) as shown in this tutorial.

As always, feel free to post any comment or question here or send me an email.

Any feedback is welcome and it helps us to continue delivering the most advanced toolchain for ARM from ARM.



Dear Friends,



here they are few lines of GCC-Assembly code to make your interrupt in Cortex-M4 fully
reentrant. Please read notes from Sippey  before proceeding to details of implementation
of this page.


NOTE1: The code uses large amount of stack (even 32 or 136 bytes each reentrant call depending
on the use or not of floating point operation), so be careful in excessive use of re-entrancy and remember to set stack appropriately. When you use this code within matlab/simulink, you need at least 136 bytes more each sampling rate in simulink schematic.


NOTE2: This code is inspired and optimized by the work of other authors, who better than me

knows ARM assembly and Cortex Architecture.



NOTE3: Re-entrant code is supposing that at interrupt exit the processor returns to task

space (being it on PSP or MSP). Hence to avoid messing the stack preemption function

should only be called by the lowest interrupt priority in the program.



Function description:




    - first pushes a dummy stack (only 32 bytes) on the stack and returns from the interrupt.

    - The return address programmed in the dummy stack is in the same function code, so
      that the rest of the code executes as being in the process-thread mode (instead of having
      the interrupt priority)

    - Once returned in the thread mode the code calls the function FUNCTION. This is a normal

      function call (e.g. the stack is saved again by the processor mechanism)

    -  At return it generates a software triggered interrupt SVC to restore STACK




    - Determines which SVC code was called.

    - In case other code a traditional IntHandler is executed

    - Otherwise we call RIPrestore who clean up the original interrupt stack.



NOTE: Why we restore stack in the SVC instead of using the RIPrun? Cortex CPU can process

two types of threading model, using one or two different stacks (PSP/MSP) when in appropriate mode.

Hence the original stack is being saved on a stack that is depending on the threading model. The

SVC call ensures that the processor recovers the stack appropriately.



MAJOR differences from Sippey code:

    1st use of defines to decide which priority levels and callback procedures to use;

    2nd all implementation are done using inline assembly from GCC

    3rd use of naked "C" functions to limit overhead due to function call

    4th RIPrun function locally encodes the return address
        (ADDW  R0, PC,16 ; SKIP 8 Instruction from here)
        to ease the code.



* Reentrant Interrupt Procedure Call (RIPC)
* ARM-GCC code to implement REENTRANT interrupt procedures.
* Source of inspiration:
*     - "The Definitive Guide to ARM Cortex-M3 and Cortex-M4 Processors" 3rd ed.
*     - Sippey code for KEIL: Sippey (
* CORTEX M4 Register Structure
*   - CPU 16 Register (R0-R15) + PSR
*   - FPU 32 Register (S0-S31) + FPSCR
*      - R0-R3, R12, LR, and PSR are called “caller saved registers.”
*   - R4-R11                  are called “callee-saved registers.
*   -  S0-S15 + FPSCR are “caller saved registers.”
*   -  S16-S31        are “callee-saved registers.”
* Typical Calling Layout
*         R0/R1 is Return Result Value if any
*         R0-R3 are parameter value (with the above exception)
*         R12 is a scratch register
*         R13 used to store SP
*         R14 link register (return address)
*         R15 is Program Counter
* Stack Structure (growing from TOP to LOW memory)
*     BEWARE for efficiency Stack is manipulated aligned to 8 bytes always
*            in case of ODD number of registers it gets padded with white space
* Cases:      NOFPU               FPU
*         32  (pad align 8)      (PAd align 8)    if PADDING present xPSR bit9 == 1
*            28  xPSR           96    FPSCR
*            24  ReturnAddr     92    S15
*            20  LR             88    S14
*            16  R12            84    S13
*            12  R3             80    S12
*             8  R2             76    S11
*             4  R1             72    S10
*             0  R0*            68    S9                NO FP Stack pointer here
*          ==============       64    S8
*             8 REGs            60    S7
*                               56    S6
*      (total 8x4=32bytes)      52    S5
*                               48    S4
*                               44    S3
*                               40    S2
*                               36    S1
*                               32    S0
*                               28    xPSR
*                               24    ReturnAddr
*                               20    LR
*                               16    R12
*                               12    R3
*                                8    R2
*                                4    R1
*                                0    R0*               FP Stack pointer here
*                           ====================
*                            8+17 = 25 REGS PADDED to 26   (Total 26*4=104bytes)
*         The return address is the stacked PC
*         While Stacked LR was previous return address
*         BX LR is return from subroutine
*         if LR start with 0xFxxxxxxxx then it is interpreted as Return from Interrupt (Exception Return)
*         Possible Exception return values are:
*         if FPU was used before interrupt call
*         0xFFFFFFE1 Return to another exception using MSP (Master)
*         0xFFFFFFE9 Return to thread using MSP (Master)  stack pointer
*         0xFFFFFFED Return to thread using PSP (process) stack pointer
*         if FPU was not used before CALL
*         0xFFFFFFF1 Return to another exception using MSP (Master)
*         0xFFFFFFF9 Return to thread using MSP (Master)  stack pointer
*         0xFFFFFFED Return to thread using PSP (process) stack pointer

#include <misc.h>
#include <stm32f4xx.h>

// Lazy using strings to pass parameter to Assembly code
#define SVC_CALL_NUMBER       "0"     // SVC_CALL_NUMBER being used
#define PRI_LEVEL_LOCK        "240"   // Level 15 for STM32F4

static void RIPCrun( void (*fcn)(void) ) __attribute__ (( naked, used ));
static void RIPCrestoreSP( void ) __attribute__ (( naked,used ));

* This is NEW default handler for standard SVC if used override if
* required as usual in CM4
__attribute__(( weak,used )) void SVC_Orig_Handler()
    while(1); // No other default service! Catch or return?

* \brief RIPCrun makes the interrupt reentrant. It pushes a dummy
* stack, loads a fake return address depending on the FPU and call type
* and returns. The return address is given as param.
* Usage example
*     void SysTickHandler()
*     {
*            // NON REENT CODE BEFORE
*         RIPCrun(reentrant_Handler); // Call to reentrant code
*     }
* To avoid undesired preempt. The call is made in two stages,
* first we call/return to RIPstub that on its own calls desired
* Handler
* Note that the interrupt being made reentrant should have the lowest
* priority.
static void RIPCrun( void (*fcn)(void) )
                                                // R0 at entry contains the jumping address
    __asm volatile(
#ifdef __FPU_USED
            " TST LR, #0x10                  \n" /* Test bit 4 to check usage of FPU register */
            " IT EQ                          \n"
            " VMOVEQ.F32 S0, S0              \n" /* Mark FPU used for Lazy stacking operation  */
            " MRS  R1, xPSR                  \n" // Should be xPSR ??
            " PUSH {R1, LR}                  \n" /* Push PSR and LR on the stack*/
            " SUB  SP, #0x20                 \n" /* Reserve additional 8 words for a complete dummy stack return*/
            " STR  R0, [SP]                  \n" // Pass the R0 to Callee in return
            " ADDW  R0, PC,16                \n" // RIPCservice  (SKIP 8 Instruction from here)
            " STR  R0, [SP, #24]             \n" // Handler Launcher in thread (Temp return addr)
            " MOV  R0, #0x01000000           \n" // Generate a fresh new PSR
            " STR  R0, [SP, #28]             \n" // and store it (PSR) in proper offset
            " MOV  R0, #0xFFFFFFF9           \n" // Create a return value for ISR return to MSP no FP (8 Word frame)
            " MOV  LR, R0                      \n" // and place it to LR to emulate standard ISR return
            " BX   LR                         \n" // The return here will use our dummy stack

            // RIPCService
             * No we exited the interrupt and enter immediately here (SP+24 to this address).
             * At return the R0 register will be populated from the dummy stack with the parameter passed
             * to the RIPrun (ex R0) and we will jump there immediately.
             * Not this procedure call will be handled in MSP stack whatever would have been the original
             * THREAD stack (PSP or MSP).
            " BLX  R0                        \n" // RIPService Call function desired
            " MOVS  R0, #" PRI_LEVEL_LOCK "  \n" // Rearrange PRIORITY level to
            " MSR  BASEPRI, R0                  \n" // Block further trigger on our base interrupt
            " ISB                            \n" // ISB required to wait for BASEPRI effect (avoid further preemption)
            " SVC  #" SVC_CALL_NUMBER "      \n" // Replace here with desired syscall number
//            " BL   RIPCerror                 \n" // SVC will reset stack, we should not return here
    while(1); // We should never get here, otherwise stack was messed up!

* \brief Control logic is the following
*                  RIPsvc();
*         else
*                  SVC_Orig_handler();
* This handler and the RIPCsvc function are restoring the stack and hence should be protected against
* further reentrant interrupt of the same kind otherwise the stack can be messed up.
* The SVC handler always executes with MSP stack, but the original SVC service number can be stored in
* MSP or PSP. Hence the initial test serves to properly extract the SVC number.
__attribute__(( naked )) void SVC_Handler()
    __asm volatile(
            " TST    LR, #0x04               \n" /* Test EXC return bit 2 (MSP or PSP?)*/
            " ITE    EQ                      \n" // if 0
            " MRSEQ  R0, MSP                 \n" // Get SP from MSP
            " MRSNE  R0, PSP                 \n" // else use PSP
            " LDR    R1, [R0,#24]             \n" // This is offset of stacked PC
            " LDRB.W R0, [R1, #-2]           \n" // Check SVC calling service
            " CMP    R0, #" SVC_CALL_NUMBER "\n" // Replace here with desired syscall number
            " BEQ    RIPCrestoreSP           \n" // use our modified SVC handler
            " B      SVC_Orig_Handler        \n" // else jump to the original handler
    while(1); // We should never get here, otherwise stack was messed up!

* \brief this function is called after the SVC handler properly identified we are
* returning from a reentrant interrupt.
*   -  We restore BASEPRI set to avoid nesting of SVC_handler (which produces a fault).
*   -  We remove the stack provided by the SVC_Handler call.
*   -  We recover PSR and LR as for the original storage in the RIPCrun
*   -  We return this SVC using the stack pushed for the RIPCrun.
* DOUBT: Why triggering lazy stacking here? does it copies value in a dummy stack which
*  is trashed a couple of instruction later?
static void RIPCrestoreSP( void )
    __asm volatile(
            " MOVS R0, #0             \n" /* Use the lowest priority level*/
            " MSR  BASEPRI, R0      \n" // to renable the interrupt
            " ISB                   \n" // Ensure synchronization
#ifdef __FPU_USED
            " TST LR, #0x10         \n" /* Test bit 4 to check usage of FPU register */
            " IT EQ                 \n"
            " VMOVEQ.F32 S0, S0     \n" /* Mark FPU use for Lazy stacking operation  */
            " TST LR, #0x10         \n" /* Test bit 4 to check usage of FPU register */
            " ITE EQ                 \n"
            " ADDEQ SP, SP, #104     \n" // Restore stack properly
            " ADDNE SP, SP, #32     \n"
            " POP {R0, R1}          \n" /* Push PSR and LR on the stack*/
            " MSR APSR_nzcvq,R0     \n" // Should be xPSR ??
            " BX   R1                \n" // Finally jump to R1
    while(1); // We should never get here, otherwise stack was messed up!

#define TEST_REENT
#ifdef  TEST_REENT

#define NESTLEVEL 20

static int pass = 1;
float NPI[20];
unsigned int stackIN[NESTLEVEL];
unsigned int stackOUT[NESTLEVEL];
unsigned int nesting = 0;

* \brief executes some FP operation. Marks stack at entrance and exit and
* waits in the middle for a number of nested recursion.
* Note that the Stack consumption is about 72 bytes for nonFP reent
* and 144 bytes for FP reent. This is due to the double procedure
* call that is set at each interrupt (e.g. the original stack call
* is preserved until the end + one procedure call get through the BLX
* We have 8 local bytes on the stack more
* Which makes 32+8+32 (Two complete stacks + 8 bytes for temporary PSR&LR)
* Or 104 + 32 + 8 = 144 in case of FP call stack
* The bytes overhead w.r.t. the standard mechanism is hence 40 bytes.
* Beware to have a large enough stack for reentrancy.
void ReentTickTest()
    register unsigned int *stackref;
    int a=0,lev;

    __asm__ ("mov %0, sp" : "=g" (stackref) : );
    NPI[lev] = 3.1415926535f*lev;

        // Wait for Rentrancy
    __asm__ ("mov %0, sp" : "=g" (stackref) : );
    if (lev==0) pass=2;

void SysTick_Handler()

int main(void)
    float jj,kk;
    jj = 3.14;
    kk = jj*2;

    // The chosen IRQn should be the lowest in the system so that we are
    // sure that when this interrupt is exited we will return to thread
    // mode with a well not stack recovery mechanism.
    // The alternative is to disable the interrupt in the code, but this
    // violates the rule of MAX 12cycles for interrupt latency which is
    // one of the best features of Cortex

    for (;;)
            if (pass==2) break;
        nesting = 0;


Filter Blog

By date:
By tag:

More Like This