1 2 3 Previous Next

ARM and Keil Tools

108 posts

Recently I have been preparing one of the demos for ARM's booth at Embedded World 2015 (which occurred last week 24th - 26th February) to showcase ARM Development Studio 5’s (DS-5) debug and trace functionality. This demo makes use of Freescale's SABRE board based on their newly announced i.MX 6SoloX Applications Processor, containing an ARM Cortex-A9 and ARM Cortex-M4. The image below shows the board with an attached LCD screen and connected to an ARM DSTREAM - ARM’s high-performance debug and trace unit.



The i.MX 6SoloX is a multimedia-focused processor aimed at a wide variety of applications - achieved by combining the high performance of a Cortex-A9 and low power of a Cortex-M4 to maximise power efficiency.


Thanks to the co-operation between ARM and Freescale I was able to bring up the board very quickly, allowing me to get Linux and Android both booting on the board within a day. This is especially impressive given the board was pre-production at that point.


Once I had Linux successfully booting I then went about getting the DS-5 Debugger connected to it using the DSTEAM via the board's JTAG-20 connector. DS-5 Debugger support for the board was also added (via the configuration database) during the board's pre-production stages allowing full DS-5 support from release. This makes connecting the debugger as simple as selecting the platform in the Debug Configuration editor, choosing the appropriate connection to use and configuring it. This also enables a smooth, rapid transition from receiving the board to debugging or profiling a target application from the product's launch.


As the board has both a Cortex-A9 and Cortex-M4 on, it was a good candidate to demonstrate DS-5’s multicore debugging. These cores use Asymmetric Multiprocessing (AMP), as opposed to Symmetric Multiprocessing (SMP), meaning they are running completely independently (rather than under a single Operating System). I used DS-5 to connect and debug Linux on the Cortex-A9 as well as simultaneously using a separate connection to load a small Freescale MQX RTOS based example onto the Cortex-M4.



DS-5 Functionality

When debugging, we have access to DS-5's full suite of tools including, among others:



DS-5 allows multicore debug of targets (in both SMP and AMP configurations) using hardware and software breakpoints, with tools to provide a wide range of additional functionality from MMU page table views to custom peripheral register maps.



Instruction Trace

Collecting trace from the target using the DSTREAM allows non-intrusive instruction-level trace which can be used to help with: debug, especially when when halting the core is undesirable; post-crash analysis; and profiling the target.



RTOS OS Awareness

DS-5 offers additional information views (e.g. memory use, running tasks etc.) for a number of the most popular RTOS (pictured below is one of the views for MQX).



Linux OS Awareness

Specialised connections are available to Linux targets to allow debug and trace of the Linux Kernel and Kernel Modules as well as visualisations of the threads, processes and resources.



Linux Application debug

Applications may be debugged by downloading the target application and compatible gdbserver to the target for debugging using the DS-5 interface (concurrently with Bare Metal or OS-level debug and/or trace as desired).




Streamline allows visualisation of the target’s performance for profiling and optimizing target code. Requiring no additional hardware or debug connections, Streamline operates only via TCPIP (or optionally an ADB connection in the case of Android) and can profile many aspects of a CPU/GPU/interconnect in high detail at varying resolutions, including power analysis (via external or on-chip energy probes). The image below shows a Streamline trace from Linux on the i.MX 6SoloX under heavy load.





Thanks to a stable pre-production platform and Freescale's assistance adding the new board to the debug configuration database the whole bring up experience was seamless. For information on creating debug configurations for new platforms, see the "New Platform Bring-Up with DS-5" post. ARM encourages platform owners to submit any debug configurations for their platforms back to the DS-5 team so we can include them by default in subsequent releases - thus allowing the same, fluid out-of-box experience for end users.

February has been an exciting period for ARM: from the announcement of new products for the Premium Mobile segment to the release of the new DS-5 v.5.20.2

DS-5 v5.20.2 includes ARM Compiler 6.01, the latest LLVM based ARM Compiler. The main highlights of ARM Compiler 6.01 are:

  • Support for the latest ARM CPUs, including Cortex-A72
  • Extended support for the Cortex family of processors
  • Support for bare-metal Position Independent Executables (PIE)
  • Support for link time optimization (LTO)


Support for new CPUs

This release brings to the market the most advanced compilation technology in ARM Compiler 6.01 for the new ARMv8-A Cortex-A72 (-mcpu=cortex-a72) processor. With the support of the Cortex-A72, ARM Compiler continues to provide early-support for new cores to enable our customers to start developing as soon as possible and reduce the time-to-market.


Extended support for the Cortex family of processors

The new release of ARM Compiler adds support for ARMv7 Cortex-A class processors, enabling customers to take advantage of the new compiler for a wider range of products.

ARM Compiler 6.01 brings the advanced code generation used to build ARMv8 code to the more consolidated 32bit world, speeding up the adoption of the new compiler for companies not yet using ARMv8.

There’s more: ARM Compiler 6.01 adds initial (alpha quality) support for both Cortex-M and Cortex-R families to let engineers familiarise themselves with the new features and start the evaluation as soon as possible.

For more details see the release notes for ARM Compiler 6.01.


Bare-metal PIC support

Security has become a crucial aspect of applications, especially when connected to the network or available on the internet. One of the most common attacks to gain privilege on a system is through buffer overflows: this anomaly could potentially lead to the execution of malicious code, jeopardizing the security of the entire system through code injection.

Different techniques have been created to make a hackers’ life harder, one of the most commonly used to reduce the risk of attacks is to randomize the address space layout (ASLR). This technique is widely used in several high-level Operating Systems like Android, iOS, Linux and Windows. With ARM Compiler 6.01 it’s possible to extend the usage of this protection also on bare-metal applications.

ARM Compiler 6.01 allows the creation of bare-metal Position Independent Executables (PIE) which allows the executable to be loaded anywhere in the memory and the code will automatically recalculate the new addresses. More details are on infocenter about the Compiler command line option –fbare-metal-pie and Linker command line option –fpic.


Link Time Optimization

ARM Compiler is able to optimise code generated from each source file to get the best performance out of ARM processors. But what about optimising the different modules? ARM Compiler 6.01 introduces initial support for Link Time Optimization (LTO), which extends the capability of the compiler and the linker to perform optimizations by looking at the whole program and not just a single compilation unit, giving an extra performance boost!

To enable Link Time Optimization in ARM Compiler 6.01, take a look at the documentation in infocenter about Linker command line option –lto.


I hope that you found this information useful and you are ready to use ARM Compiler 6.01! If you still don’t have DS-5, download a free 30-day evaluation .

Feel free to post any questions or comments below.





4 Years On - Fast Models 9.2

Posted by robkaye Feb 23, 2015

At the end of last week (20/Feb/2015) we released Fast Models 9.2.   This year we are moving to a quarterly release cycle.  The more frequent releases enable the accelerating rate at which we are developing and deploying new Fast Models, and the speed at which partners pick them up and put them to use.  The main focus for Fast Models 9.2 was the release of the Cortex-A72 and CCI-500 models following the announcement of the IP earlier in the month.   Lead partners had been developing virtual prototypes with these models for several months already.   We also included critical fixes for partners and completed the next stage of performance improvement work.  For the latter, this release cycle our emphasis has been on how to models behave when used in SystemC simulations with many Fast Model components: something that is important to many of our partners.


It's been just over 4 years since I joined the Fast Models team.  I've just been working on a summary of how the solution has evolved in that time for an internal conference and thought it would be interesting to look how things have moved ahead in those four years.


Firstly, we have seen a rapid growth in usage: more and more partners are leveraging virtual prototypes as part of their SoC development process.   We are also seeing the models used in many more different ways.   Early software development remains front and center in our thoughts, but we have seen increasing use of the models in software driven validation of the hardware, in performance estimation and device compliance validation.


In 2011 we were working on the first models for ARMv8 cores.  That year we introduced models for four new cores at either beta or release status.  In 2015 it will be close to treble that amount.  On the System IP front it's the same story: approximately three times as many models will be rolled out this year compared to 2011.  Fast Models for Media IP (GPU, video and display processors) were just a concept on a road map in 2011 but this year we have several in the works along with a range of platform models that combine the media models with CPUs, system IP.   These platforms are aligned with the availability of IP and software stacks from sister teams in ARM to provide partners with a complete solution.


The underlying tools that support these models must move forward with the models deliveries. I've already mentioned the burgeoning use cases, then combine that with increasing complexity of the platforms being designed and the advance of host workstation operating systems,  To support this we have a comprehensive road map of feature support (such as checkpointing and timing annotation) to complement the continuous improvements in performance and quality and OS/tools support.


It's definitely an exciting and challenging part of the ARM story to be involved with.  I'm looking forward to 2015 and beyond with great anticipation.


Spectrum.pngTwo ends of the spectrum: Virtual Prototypes for an ARMv8 big.LITTLE mobile platform and a Cortex-M7 MCU.

Performance and power optimization are critical considerations for new Linux and Android™ products. This blog explores the most widely used performance and power profiling methodologies, and their application to the different stages in the product design.


The need for efficiency

In the highly competitive market for smartphones, tablets and mobile Internet devices, the success of new products depends strongly on high performance, responsive software and long battery life.


In the PC era it was acceptable to achieve high performance by clocking the hardware at faster frequencies. However, this does not work in a world in which users expect to always stay connected. The only way to deliver high performance while keeping a long battery life is to make the product more efficient.


On the hardware side the need for efficiency has pushed the use of lower silicon geometries and SoC integration. On the software side performance analysis needs to become an integral part of the design flow.


Processor instruction trace

Most Linux-capable ARM® processor-based chipsets include either a CoreSight Embedded Trace Macrocell (ETM) or a Program Trace Macrocell (PTM).


The ETM and PTM generate a compressed trace of every instruction executed by the processor, which is stored on an on-chip Embedded Trace Buffer (ETB) or an external trace port analyzer. Software debuggers can import this trace to reconstruct a list of instructions and create a profiling report. For example, DS-5 Development Studio Debugger can collect 4GB of instruction trace via the ARM DSTREAM target connection unit and display a time-based function heat map.


Instruction trace generation, collection and display.PNG

Figure 1: Instruction trace generation, collection and display


Instruction trace is potentially very useful for performance analysis, as it is 100% non-intrusive and provides information at the finest possible granularity. For instance, with instruction trace you can measure accurately the time lag between two instructions. Unfortunately, trace has some practical limitations.


The first limitation is commercial. The number of processors on a single SoC is growing and they are clocked at increasingly high frequencies, which results in higher bandwidth requirements on the CoreSight trace system and wider, more expensive, off-chip trace ports. The only sustainable solution for systems running at full speed is to trace to an internal buffer, which limits the capture to less than 1ms. This is not enough to generate profiling data for a full software task such as a phone call.


The second limitation is practical. Linux and Android are complex multi-layered systems, and it is difficult to find events of interest in an instruction trace stream. Trace search utilities help in this area, but navigating 4GB of compressed data is still very time-consuming.


The third limitation is technical. The debugger needs to know which application is running on the target and at which address it is loaded in order to decompress the trace stream. Today’s devices do not have the infrastructure to synchronize the trace stream with kernel context-switch information, which means that it is not possible to capture and decompress non-intrusively a full trace stream through context switches.


Sample-based profiling

For performance analysis over long periods of time sample-based analysis offers a very good compromise of low intrusiveness, low price and accuracy. A popular Linux sample-based profiling tool is perf.


Sample-based tools make use of a timer interrupt to stop the processor at regular intervals and capture the current value of the program counter in order to generate profiling reports. For example, perf can use this information to display the processor time spent on each process, thread, function or line of source code. This enables developers to easily spot hot areas of code.


At a slightly higher level of intrusiveness, sample-based profilers can also unwind the call stack at every sample to generate a call-path report. This report shows how much time the processor has spent on each call path, enabling different optimizations such as manual function inlining.


Sample-based profilers do not require a JTAG debug probe or a trace port analyzer, and are therefore much lower cost than instruction trace-based profilers. On the downside they cause a target slow-down of between 5 and 10% depending on how much information is captured on every sample.


It is important to note that sample-based profilers do not deliver “perfect data” but “statistically relevant data”, as the profiler works on samples instead of on every single instruction. Because of this, profiling data for hot functions is very accurate, but profiling data for the rest of the code is not accurate. This is not normally an issue, as developers are mostly interested in the hot code.


A final limitation of sample-based profilers is related to the analysis of short, critical sequences of code. The profiler will tell you how much processor time is spent on that code. However, only instruction trace can provide the detail on the sequence in which instructions are executed and how much time each instruction requires.


Logging and kernel traces

Logging or annotation is a traditional way to analyze the performance of a system. In its simplest form, logging relies on the developer adding print statements in different places in the code, each with a timestamp. The resulting log file shows how long each piece of code took to execute.


This methodology is simple and cheap. Its major drawback is that in order to measure a different part of the code you need to instrument it and rebuild it. Depending on the size of the application this can be very time consuming. For example, many companies only rebuild their software stacks overnight.


The Linux kernel provides the infrastructure for a more advanced form of logging called “tracing”. Tracing is used to automatically record a high number of system-level events such as IRQs, system calls, scheduling and event application-specific events. Lately, the kernel has been extended to also provide access to the processor’s performance counters, which contain hardware-related information such as cache usage or number of instructions executed by the processor.


Kernel trace enables you to analyze performance in two ways. First, you can use it to check whether some events are happening more often than expected. For example, it can be used to detect that an application is making the same system call several times when only one is required.  Secondly, it can be used to measure the latency between two events and compare it with your expectations or previous runs.


Since kernel trace is implemented in a fairly non-intrusive way, it is very widely used by the Linux community, using tools such as perf, ftrace or LTTng. A new Linux development will enable events to be “printed” to a CoreSight Instrumentation Trace Macrocell (ITM) or System Trace Macrocell (STM) in order to reduce intrusiveness further and provide a better synchronization of events with instruction trace.


Combining sampling with kernel trace

Open source tools such as perf and commercial tools such as the ARM DS-5 Streamline performance analyzer combine the functionality of a sample-based profiler with kernel trace data and processor performance counters, providing high-level visibility of how applications make use of the kernel and system-level resources.


For example, Streamline can display processor and kernel counters over time, synchronized to threads, processes and the samples collected, all in a single timeline view. For example, this information can be used to quickly spot which application is thrashing the cache memories or creating a burst in network usage.


Streamline Timeline View.png

Figure 2: Streamline Timeline View


Instrumentation-based profiling

Instrumentation completes the pictures of performance analysis methodologies. Instrumented software can log every function – or potentially every instruction - entry and exit to generate profiling or code coverage reports. This is achieved by instrumenting, or automatically modifying, the software itself.


The advantage of instrumentation over sample-based profiling is that it gives information about every function call instead of only a sample of them. Its disadvantage is that it is very intrusive, and may cause substantial slow-down.


Using the right tool for the job

All of the techniques described so far may apply to all stages of a typical software design cycle. However, some are more appropriate than others at each stage.



Low Cost

Low Intrusiveness



System Visibility






Kernel trace






Instruction trace















Table 1: Comparison of methodologies


Instruction trace is mostly useful for kernel and driver development, but has limited use for Linux application and Android native development, and virtually no use for Android Java application development.


Performance improvements in kernel space are often in time-critical code handling the interaction between kernel, threads and peripherals. Improving this code requires the high accuracy and granularity, and low intrusiveness of instruction trace.


Secondly, kernel developers have enough control of the whole system to do something about it. For example, they can slow down the processors to transmit trace over a narrow trace port, or they can hand-craft the complete software stack for a fast peripheral. However, as you move into application space, developers do not need the accuracy and granularity of instruction trace, as the performance increase achieved by software tweaks can easily be lost by random kernel and driver behaviour totally outside of his control.


In the application space, engineering efficiency and system visibility are much more useful than perfect profiling information. The developer needs to find quickly which bits of code to optimize, and measure accurately the time between events, but can accept a 5% slow-down in the code.


System visibility is extremely important in both kernel and application space, as it enables developers to quickly find and kill the elephant in the room. Example system-related performance issues include misuse of cache memories, processors and peripherals not being turned off, inefficient access to the file system or deadlocks between threads or applications. Solving a system-related issue has the potential to increase the total performance of the system ten times more than spending days or weeks writing optimal code for an application in isolation. Because of this, analysis tools combining sample-based profiling and kernel trace will continue to dominate Linux performance analysis, especially at application level.


Instrumentation-based profiling is the weakest performance analysis technique because of its high level of intrusiveness. Optimizing Android Java applications has better chances of success by using manual logging than open-source tools.


High-performance Android systems

Most Android applications are developed at Java level in order to achieve platform portability. Unfortunately, the performance of the Java code has a random component, as it is affected by the JIT compiler. This makes both performance analysis and optimization difficult.


In any case, the only way to guarantee that an Android application will be fast and power-efficient is to write it - or at least parts of it - in native C/C++ code.  Research shows that native applications run between 5 and 20 times faster than equivalent Java applications. In fact, most popular Android apps for gaming, video or audio are written in C/C++.


For Android native development on ARM processor-based systems Android provides the Native Development Kit (NDK). ARM offers DS-5 as its professional software tool-chain for both Linux and Android native development.


By Javier Orensanz, Director of Product Management - Tools at ARM

MCUDesignContest.pngHere at ARM we’re very excited about the launch of the ARM MCU Design Contest. In cooperation with Elektor Magazine, we’re calling all engineers, hobbyists and enthusiasts to create impressive, fun and sophisticated MCU applications. Enter the competition and you could be in with a chance to win one of our cash prizes ($5,000, $3,000, $1,000, 2x $500).


Courtesy of the participating ARM partners Freescale, Infineon, NXP and STMicroelectronics, we are providing a total of 400 ARM Cortex-M4 development boards. Together with a free 6-month license of Keil MDK-Professional, we’ll equip you with all that you need so that you’re are ready to dive into your project right away.


The focus of this contest is on CMSIS software components and middleware. CMSIS, the Cortex Microcontroller Software Interface Standard, has recently been expanded with the CMSIS-Pack and CMSIS-Driver specifications to simplify development, management and deployment of software components for embedded applications. We want you to make use of the existing components, expand them or write your own from scratch.


Here are the hardware platforms you can choose from:


All of these boards have different capabilities, sensors and peripherals, making them suitable for a wide range of applications. We provide several example projects on the ARM MCU Design Contest webpage that show you how to use the drivers and middleware together.


How can I participate?

In three easy steps:

  1. Submit your project proposal via the Elektor ARM MCU Design Contest website
  2. Once accepted, receive your free board and MDK-Pro license
  3. Get developing!


Proposals are accepted until 1st of April, but we expect high demand for the boards, so the sooner you sign up, the better.

The winners will be selected by a panel of ARM engineers and Elektor editors and will be announced in the Elektor September issue and online.


Have any questions about the contest?

Comment on this blog, or join the ARM MCU Design Contest Community forum.


Good luck!


we now can offer a detailed MDK-workshop called:


"USB Host Application with File System and Graphical User Interface"

Please enter our website at:



Note that in case of any question please contact me at:



Have fun, best regards,



Ralf Kopsch

Senior Applications Engineer

ARM Germany GmbH

Chinese Version 中文版: Cocos Code IDE 1.1.0:集成ARM DS-5,高效调试C++

As an important product of the Cocos Developer Platform, Cocos Code IDE has finally brought to us its version 1.1.0integrating with ARM DS-5 and enabling efficient C++ debugging. In this new version, Cocos Code IDE has possessed authorization from ARM® to issue ARM Development Studio 5DSTM-5Community Edition, aiming at further smoothening the development process and enhancing user experience.





DS-5 is a powerful tool chain that has integrated many ARM-exclusive features into Eclipse platform; based on Eclipse development environment, DS-5 offers superior window management, project management and C/C++ source code editing tools, and supports C++ developing and debugging on Android devices.


In version 1.1.0, granted by ARM, Cocos Code IDE is motivated to offer great help and convenience to developers by providing DS-5 Community Edition for free.




Cocos Code IDE is a cross-platform IDE based on Eclipse, especially for Cocos2d-x Lua & JavaScript developers. With IDE, developers can easily create game projects, compile code and debug on different platforms. Moreover, developers are able to check the effect real time and in the end publish a ready-to-go package.


At the moment, ARM DS-5 Community Edition has been perfectly integrated into Cocos Code IDE as well as its cool feature of C++ code debugging on Android device, meaning that you can debug game logic written with script languages or C++ in the same environment. At the same time, DS-5 provides a C++ development environment for Cocos Code IDE, so that developers can now develop some key or sensitive logic with C++, and then compile, pack and debug on Android devices.




DS-5 Community Edition tool kit, based on Professional Edition, provides necessary debugging and system analysis to create reliable and highly optimized applications for devices based on ARM processor without complexity and inefficiency that are often found on the scattered open source tools. With DS-5, Cocos Code IDE is equipped with a powerful C++ code debugging on Android platform feature, supporting debugging on Android devices based on ARM9/ARM11 and Cortex®-AARMv7-Aprocessor architecture. The efficient debugging of C++ logics will greatly accelerate the Android application development process.




Android platform performance analysis is another highlight of Cocos Code IDE 1.1.0. Developers just need to click “Start Capturing” in the data view of ARM Streamline to collect information of the target for analysis and click “Stop” to check the Performance Analysis Report when the analysis is finished. Simple GPU / CPU functional analysis will list out the key information like the most time-consuming code segments, the most time-consuming function lists, etc. lowering the burden of GPU/CPU and uplifting the user experience.


The world of mobile Internet is changing dramatically every day, and so is the mobile development technology. Cocos Code IDE focuses on script game development based on Cocos2d-x engine, and is motivated to enable a smoother and faster development process and a better development experience; Cocos Code IDE also actively embraces opportunities to combine advanced features from others to help developers gain a favorable position in this rapidly changing market.


Chinese Version 中文版: 使用DS-5从FVP中收集Trace数据

One of the new features for ARM DS-5 Development Studio in v5.20 is instruction trace for our Fixed Virtual Platform (FVP) simulation models. This enables you to capture a trace of program execution in the models that are included by default in DS-5: an ARMv7 FVP (in Professional Edition and Ultimate Edition) and ARMv8 FVP (in Ultimate Edition). If you want to try it out, you can download DS-5 Ultimate Edition and generate a 30-day eval license.


What is trace and why is it useful?


Trace is the continuous collection of information which represents the execution of software on a system. In real hardware, it is non-invasive, meaning it doesn't slow a processor down. The raw trace data is highly compressed and must be decompressed to be useful. In the case of DS-5, it helps us to see the proportion of time spent in a function, along with the machine instructions which were executed at any point in the trace capture. ARM's debug and trace infrastructure is called CoreSight (which is not modeled directly in our Fast Models and FVPs).


It is used at every stage of a design process, from modelling the system through to in-field failure analysis. Trace is particularly useful for bare-metal or Linux kernel debugging.


Since ARM Fast Models and FVPs are instruction accurate, collecting instruction trace is a natural extension to the debug functionality that DS-5 already provides for models, making the experience as close as possible to writing software for a real device. It's important to remember that Fast Models and FVPs are not cycle accurate, so the actual time it takes to execute a function won't correlate with the time it would take on the real silicon (though the proportion of time spent in that function would remain fairly consistent). Now that debug configurations have been added for the three previously mentioned FVPs, it’s easy to use it in practice. Currently, there is no support for adding trace to other models in DS-5 or for collection of data trace (where address and register values are also recorded from load and store instructions). Edit - if you have your own Fast Model based platform and would like to see trace supported on it in DS-5, then please contact ARM.


Making use of model trace


You will notice in the DS-5 Debug Configurations panel that model trace is only available for bare-metal and Linux kernel debug connections. Trace isn't the right solution for debugging Linux applications, as the extra level of complexity that an OS adds would mean sifting through an unmanageable amount of trace data.




The best way to test out model trace is to import an example. There are bare-metal examples for everything from simple Hello World programs to more complex RTOS programs. In the example below, I've imported the "traffic lights" program which runs on the Keil RTX RTOS.


All of our FVP bare-metal examples have been reconfigured to collect model trace, but you can also use your own images. Once trace is configured, it behaves exactly like trace on real hardware, using the same trace view in DS-5.


Our trace is being collected into a circular buffer. This is fairly common in ARM SoCs, which use an Embedded Trace Buffer (ETB) to collect a record of software execution, which is constantly overwritten and refreshed (or alternatively, just filled once).


Trace will start automatically whenever you run through a debug session, unless you set trace start and stop points manually. This can be useful for just tracing the function that you’re interested in. In the screenshot below, you can see the trace collected on start-up of the RTX Traffic Lights example (which was set to debug from main):




It’s important to note that DS-5 doesn't automatically overwrite the contents of the trace view each time you collect more trace. If you've set trace start and stop points, clearing the trace view before running will show only the trace between these two points. In this example, I've started collecting trace when the traffic light timer is between the defined start and end times that the user sets in the program.




How does model trace work?


Model trace isn't a direct model of CoreSight. Instead, it collects model events (instructions executed and exceptions) directly. DS-5 then interprets this and displays it in the same trace view used for CoreSight trace. Model trace will slow down the execution of the model on your host machine, unlike a real life system, where the CoreSight infrastructure reports trace non-invasively.

My family are American, and this is a time of year when their thoughts turn towards the family and friends in the USA who are celebrating the Thanksgiving holiday.  For me, late November also happens to coincide with the biannual release of Fast Models.


It has been a busy six months for the team who have been working on a wide and varied range of models, as well as new functionality and product enhancements.  As you can infer from the the title, there has been a big emphasis on Cortex-M class models in this release cycle.  Although the majority of Fast Model licensees are deploying them into platforms utilizing the Cortex-A models, there is a sizable contingent of users for the Cortex-M family as well.  Cortex-M7 is the recently announced high-end micro-controller, whereas Cortex-M0 and Cortex-M0+ are more mature cores with very small footprints.  From a modelling point of view they both leverage the new Cortex-M architecture model which will also form the basis of other new models to be announced in 2015.


There are several other new models being made available alongside this release, for Media and System IP products.  These are available to lead partners, and will be in due course introduced to the standard portfolio.


Outside of the new models, the main focus for work in this release has been around performance improvements.  There have been three aspects to this: improvements to the underlying simulation engine, improvements in the bridges to SystemC and improvements in the way that Fast Models interacts with the host workstation keyboard and mouse.  The results of this work - and it's an ongoing task to maintain performance as the systems being simulated become more complex - will yield benefits for most partners and most applications.


We also added support for Visual Studio 2013 and gcc 4.7.2 as tool chains for building the simulation platforms.  Leveraging newer compilers also provides performance improvements as they generate more optimal code.


Another area that has been worked on is on the link with the ARM DS-5 tool suite.  The latest release of DS-5 (5.20) provides support for viewing trace information generated by a Virtual Platform with Fast Models.


2014 was a a busy year, both in product development and supporting the rapidly growing adoption of Virtual Prototypes as part of the SoC development process.  Our ecosystem partners have continued to integrate Fast Models into their solutions in a variety of ways.  One that has generated a lot of interest in 2014 has been "Hybrid" virtual platforms, otherwise known as co-emulation.  In these a processor subsystem running in the virtual prototype is bridged to an emulator which is used to simulate other parts of the system. A typical scenario would be for platforms that have a GPU.  The hybrid approach has yielded impressive performance gains for simulating these complex systems.


You can get an overview of what we are talking about here (a joint presentation with Cadence at the ARM TechCon last month): Reducing Time to Point of Interest With Accelerated OS Boot


Now have a moment or two to draw breath before diving into the development cycle for the 2015 releases.  We have a full road map of new products to model, a focus on providing more hooks in the models for profiling the software running on them and of course, the ongoing performance work.


Happy Thanksgiving!

R for Real-time

We are very excited to have partnered with Renesas Electronics to introduce support for the recently announced Renesas RZ/T1 product series in ARM DS-5 Development Studio. The new device family comprises ten ARM Powered® products aimed at industrial applications that require both high performance and real-time predictability. Based on an ARM® Cortex®-R4 processor operating at up to 600 MHz, the product line also includes configurations that feature a Cortex-M3 core to enable highly integrated asymmetrical multi-processing (AMP) applications. Visit Renesas website (English/Japanese) if you want to learn more about the RZ/T1 series.


Renesas RZ/T1 in DS-5

ARM DS-5 is a complete software development tools solution for RZ/T1 users. It includes efficient C/C++ code generation for both ARM cores and full support for synchronous and asynchronous AMP debug.


Some benefits of DS-5 for RZ/T1:

  • ARM Compiler 5: industry reference C/C++ compiler for Cortex-R4 and Cortex-M3 processors, compatible with the widest range of RTOS, middleware and third-party tools
  • Simultaneous debug connections to both ARM processors
  • Collection, decode, synchronization and visualization of trace data from ARM CoreSight™ ETM (Embedded Trace Macrocell) and ITM (Instrumentation Trace Macrocell) units for faster bug finding
  • TÜV SÜD certified compiler and compiler qualification documentation for functional safety certification
  • Built-in OS awareness for leading commercial real time operating systems (RTOS)


Target connections for your every need

Depending on the stage you are in the software development project and your budget, you may select different technologies to connect DS-5 to your RZ/T1 target. See below summary of the target connection options, for you to pick the right one for your needs.


Target connection typeBest forTrace capture
DSTREAMBoard bring-up, high-performance debug and ETM trace-based analysis

off-chip (4 GB DSTREAM)

on-chip (4 KB ETB)

ULINKproFast software debug with on chip trace  (Note: ULINKpro trace is not supported in DS-5)on-chip (4 KB ETB)
ULINKpro DFast software debug with on-chip traceon-chip (4 KB ETB)
ULINK2Basic software debugon-chip (4 KB ETB)
CMSIS-DAPSilicon evaluation on development boards (USB connection to board, no debug hardware required)on-chip (4 KB ETB)




RZ/T1 platform configuration file is available for DS-5 version 5.20 users upon request. If you require access to it now, get in touch.

Hi, It would be really useful for the team here at ARM, if you could have a few moments to complete a short survey of your experiance with Juno ARM's Development Platform for ARMv8. I can feed this back into the requirements for future plaftoms and also try to address any issues you incountered either with the software or hardware. Click here to complete survey




ARM releases updates to DS-5 on an approximate quarterly cadence, and as I write this, we have just released v5.20 of the tool. With each release we increment the debugger and compiler(s) version to the latest version available. This can cause difficulty for the compiler use case, as typically the version of a compiler used for a given project becomes fixed at some point in time, and moving from this version becomes costly from a validation and/or qualification point of view. ARM has always made the ARM Compiler package available as a separate installation for those users that do not wish to use the default version. However, until now, it has been precisely that... a separate installation, detached from the rest of your DS-5. As you've likely guessed from the title of this article, this is a situation that we have improved within DS-5 v5.20.


You will first need to download the appropriate compiler package, and install. I recommend installing to the DS-5 sw directory where the default tools are, though you can use any location. You then need to make DS-5 aware of this installation. To do this, launch the Eclipse GUI, and navigate through the Window menu to Preferences. Therein, drill down the left hand side to select DS-5, and then Toolchains:




Click Add... and navigate to the bin (or bin64) directory of the compiler you wish to add:




Click on Next and the tools will verify the location:




Click on Finish, and you will be asked to restart the GUI. The newly added compiler version will now be available to use in your projects:




For those that build from makefiles or equivalent, these changes are reflected in the DS-5 Command Prompt. Now, when you launch this, you will first see the following:



We have created an add_toolchain script that behaves in a similar manner to the above. Use the path to the bin directory and it should work just as above.


We provide two scripts (select_toolchain and select_default_toolchain) to allow the user to set the appropriate compiler version that they wish to use. As the names suggest, using select_default_toolchain is a do-once operation to set the default compiler version used each time the user launches the command prompt. If necessary, you can then override the compiler version used for a particular compilation session using select_toolchain. The use of either of these options is simple. Call the appropriate script, and you will see a list of available compiler versions. Select the version you wish to use, and the environment will instantly be set to point to that version.



Join us for a live webinar on Monday 10th November 10:00 am PST (1:00 pm EST, 6:00 pm GMT) to learn about the new features of CMSIS v4. This will be hosted by Christopher Seidl, Keil MDK Technical Marketing Manager.


The Cortex-M Microcontroller Software Interface Standard (CMSIS) is a vendor-independent standard for hardware manufacturers and tool vendors. It provides common software layers and interfaces for all microcontrollers based on ARM Cortex-M processors.


In this webinar you’ll learn about the recently released CMSIS 4, which includes several major improvements that will affect embedded developers in the near future. It adds a standardized driver interface for Middleware and user applications (CMSIS-Driver) and Software Packs for device and board information as well as the delivery of software components.


Take away from this webinar a deeper understanding of CMSIS 4 and how it can be used to speed-up the software development process. All parts of CMSIS will be covered and code examples will be used to demonstrate ease-of-use.


Register for your place »



For the last TechCon, my colleage Bob Boys has created a comprehensive application note that tells you how to create a middleware applications using CMSIS-Drivers and our middleware that comes with Keil MDK-ARM Professional. You'll learn how to start a project from scratch, adding features as you go. The application note explains how to setup, debug and run a CMSIS-compliant application using CMSIS-RTOS RTX, USB and graphic display capabilities. It uses the STMicroelectronics STM32F429IDISCOVERY kit, but works similarly with all ARM Cortex-M based microcontrollers and development boards that have CMSIS-Drivers available (for example the MDK5 - KeilMCB1800 based on NXP LPC1857).


Check out the application note on the Keil website: Application Note 268: Creating a CMSIS Middleware Application

Happy Friday!! This week is over and that is a beautiful thing. I am going to celebrate finishing my operating systems midterm by answering a rtx question today.

This week is from the Connected Community:

"Hi Experts,


There is code pieces on Full context and reduced context in the RTX code.


What this is actually intended for ?"


Full context Vs Reduced Context  techguyz



There are 2 types of task switch in RTX library for ARM7 and ARM9: reduced context switch and full context switch.

Reduced context switch does not store all registers to the stack on task switch. The advantage of this move is to make the system more efficient in that the switch is faster and requires less space on the stack. Some functions that utilize this type are os_tsk_pass, os_dly_wait, os_sem_wait, etc.

Then there is the full context switch that stores all registers to the stack. This task switch is slower and needs more space on the stack, ideal for swithces with timeouts (i.e. round robin). Some of the functions that utilize the full context are os_dly_wait and isr_evt_set.

Since the rate of the context switch can make or break a real-time operating system's ability to process efficiently, RTX takes advantage of the reduced context to save time wherever possible and full context in functions where the register contents are too critical to omit.

It's also worth noting that this is not relevant for Cortex-M processors, which use the CMSIS-RTOS compliant version of RTX. 



MCB2300 Influence of TCPnet on RTX scheduler

TestPCF8563.rar RTX_Config.c

Filter Blog

By date:
By tag: