1 2 3 Previous Next

ARM Processors

217 posts

Are you a designer who is too busy to attend ARM TechCon in Santa Clara later this week? Then think again, you might well save a lot more time than the day spent to attend. You’ll get the chance to learn about and see the demo of the latest and truly greatest tools for automating IP design. We are previewing a new Socrates design environment recently acquired with ARM’s purchase of Duolog Technologies on theARM booth 300.


What I want to highlight in this blog is how ARM has used the versatile Socrates platform to create a tool that has the effect of combining years of engineering experience in an easy-to-use tool. That’s literally decades of experience encapsulated in hundreds of rules and algorithms of what to do and what not to do when creating either a CoreSight debug and trace system or a CoreLink interconnect system.


System IP configurability is a key aspect to designing the very best SoCs, but with it comes increased complexity of design choices, system integration and verification. System IP configurability has evolved from simple hardware parameterization to highly configurable architectures and IP boundaries. We will look at how configurability is modeled in design flows and try and understand where current design flows are limited.


A defining feature of any system interconnect or debug and trace solution is its configurability, which I touched upon in my last blog. This configurability is vital for its function and makes it versatile for specific user requirements. The simple fact is that no two SoCs are the same, and the sytem IP needs to adapt to match. However, it often raises a number of design decisions.


You begin to ask yourself:

  “Do I need this feature?

  What is the best value to set this parameter to?

  How do I pick the most appropriate option for my SoC?”

These questions can often mount up, along with the nagging doubt that "Maybe I haven’t configured all my components to fit together correctly. I really don’t want a deadlock situation." Much like tuning a Formula 1 car engine, there are so many variables that can be tweaked that it can be difficult to settle on a setup that maximises the performance for a specific use while making sure that there is balance across the system.


What ARM has done with the Socrates design environment is create a tool that instantiates all of these connections through the use of rules and algorithms, thus ensuring that your system is correct-by-construction and unleashes the full potential of your CoreLink or CoreSight IP. In essence, it combines the intelligence and experience of our tech leads, system architects and support engineers inside one toolbox that allows you to package interfaces, build micro architectures and test connectivity in an easy-to-use design

environment. This allows you to cut through the noise of all the connection options and choose the one that works best for you. At ARM TechCon we are previewing two tools with this built in intelligence to take your design intent, the high level spec of what you need the CoreLink or CoreSight system to do. It then automates the configuration and connection of all the necessary IP blocks to create the required sub-system and verify its correctness, give estimates of its area & performance and generate all the output you need to take the design forward into the implementation stage. The new design environment delivers productivity in two main ways:


  1. It automates the mechanical and repetitive tasks for you, cutting out the risk of errors
  2. It assists you in the real design choices only the designer can make


Some mechanical aspects are ripe for automation and save months of error-prone donkey work: identifying the exact interface definitions you need to connect to via IP-XACT descriptions and matching your system interface to them, generating precise and easily sharable documentation of your design; generating testbenches and test codes around your system, to name but three. Of course there are some design tasks that are more subtle and require a combination of intelligent algorithms along with the designer’s input in order to create a system that fits the user requirements. This mixture of necessary functions and new features is how architects can really differentiate their SoCs and add value to customers. Here instant feedback on the area and performance of the design guides those design trade-off decisions, leading to the most appropriate design. And anything you have modified manually still gets the automatic checking. This all adds up to a more optimised SoC that is designed faster and with less risk.


So to all those overworked engineers, I hope after reading this you’ll make the time to learn how to save a lot of time and come and visit us on booth #300 this Wednesday or Thursday or catch Simon Rance's technical session Friday 3:30pm.

So you are excited about the release of a new ARM-powered smartphone or tablet device – and why shouldn’t you be! You’ve made your way to your preferred tech review website where you discover that the device is running big.LITTLE™ technology – sweet! Though hang-on, it’s also running the latest version of ARM big.LITTLE software “big.LITTLE MP”. So what additional benefits does big.LITTLE MP bring compared to its predecessor? In this blog, I attempt to answer this frequently asked question.


The mobile analytics firm, Flurry, carried out an analysis on smartphone users in the US and made some interesting findings. The study found that mobile users spend most of their time on the following mobile activities:

  1. Web browsing and Facebook;
  2. Gaming;
  3. Audio, video and utility.


Calculated on a daily basis, web browsing and ‘facebooking’ accounted for 38% of a mobile user’s smartphone interaction time, gaming accounted for 32% and the use of audio, video and utility services was third in line at 16%. In total, the top three activities account for a staggering 86% of the time we spend on smartphones, and goes to show how far the mobile use case has come from the times when mobile phones were used plainly for voice calls and text messaging.


But how do these use cases impact on power consumption? By looking at the power profile (i.e. the power vs. time) for each activity, a pattern begins to emerge that highlights three very distinct patterns.


Mobile web browsing

For the web browser analysis, we used the BBench browser benchmark provided by the University of Michigan. BBench simulates browsing popular websites of varying richness and complexity, and enables key parameters to be configured. In order to ensure reliable results were obtained, we ran the workload with a clear environment for maximum accuracy and reproducibility. To maximize the reproducibility, execution of these workloads and related measurement were automated. The following graph shows the power profile that we produced from a run on a Symmetrical Multi Processing (SMP) system consisting of a Quad-core Cortex-A7 CPU subsystem.

                 Burst in Performance Graph.jpg

Graph 1: Power profile of web browsing use case


The first thing you will notice about the power profile (Graph 1) is the spikes in power. These typically occur when launching an application, loading content or scrolling through webpages. In other words, they occur when the system requires a short burst of performance to respond to a user interaction. Responsiveness is a type of user experience metric and therefore the better your mobile system is at handling such workloads, the better the overall mobile user experience.

Mobile gaming

For the mobile gaming workload, we ran the popular gaming application CastleMaster. Through workload analysis, we selected a period of gameplay that produced high intensity performance load which was automated to ensure reproducibility. The following graph shows the power profile produced from this workload from a run on an SMP system consisting of a Quad-core Cortex-A7 CPU subsystem.

                 Sustained Performance Graph.jpg

Graph 2: Power profile of web browsing use case

The power profile here requires a more constant level of power, which is common in intensive gaming applications, where the CPU cores are required to process a high amount of multi-threaded data for the GPU cores. In workloads like these, as you can imagine, power efficiency within the thermal budget of the system is vital.

MP3 audio playback

To demonstrate MP3 audio playback, we played a freely available MP3 audio sample on the default Android music player. The following graph shows the power profile that we produced from this workload from a run on an SMP system consisting of a Quad-core Cortex-A7 CPU subsystem.

     Low Intensity Graph.jpg

  Graph 3: Power profile of web browsing use case

Workloads such as audio playback and video playback are known as low intensity workloads and tend to have long use periods. Power savings is therefore essential to having a longer battery life.


Analysing the patterns in the power profile from each of the mobile applications above, we are able to identify three main building blocks, each present with a high degree of prominence across the workloads:

        1. Burst of high intensity workloads
        2. Sustained performance workloads
        3. Long-use low intensity workloads


Workloads 2.jpg

Graph 4: Power profiles of the building blocks in the top three mobile use cases

Graph 4 shows a conglomeration of each of these categories. We are able to observe a high degree of power and performance requirements in today’s mobile applications, particularly in the three classes of mobile activities that we spend most of our time on. In real life, a mobile user is usually listening to an MP3 audio playback while surfing the web or watching an embedded video while using Facebook. In such instances, we would expect a combination of these three classes of workloads. In order to be able to handle the requirements of such a mix of workloads efficiently, a combination of high performance and high power efficiency cores working seamlessly in a single mobile system is required.

This is where big.LITTLE Technology comes in. big.LITTLE Technology is a power optimization technology that, through the combination of high performance "big" cores and high efficiency "LITTLE" cores, along with big.LITTLE MP software, ensures the right task is run on the right core. This delivers increased levels of power efficiency, battery life and user experience. Graph 5 shows a comparison of the degree of improvement on average that big.LITTLE MP delivers when compared to its predecessor, Cluster Migration.

Worklod bL benefit.jpg

Graph 5: big.LITTLE MP improvement over big.LITTLE Cluster Migration

If you are keen to find out more about how big.LITTLE MP is able to achieve these improvements, I will be delving into this topic in my "big.LITTLE Unleashed" presentation at this year's ARM TechCon event, held next week (October 1st-3rd). If you have not registered for it yet, be sure to register for TechCon now.


If you are unable to make it, however, then fear not! In my next blog, I will dive deeper into the details of how big.LITTLE MP is able to achieve these improvements and show how it enables you to enjoy a higher quality mobile experience.

By Bee Hayes-Thakore and Thomas Ensergueix


Pervasive connectivity, largely spurred by mobile and tablet use is transforming the way we consume and interact with each other through cloud connectivity. The Internet of Things will expand this interaction further to a multitude of connected devices, influencing the connected city, home, healthcare and all aspects of embedded intelligence. This future demands embedded intelligence to be always-on, always-aware, always-connected, and demands more performance (particularly high Digital Signal Processing (DSP) performance) for local data pre-processing, voice and image processing, access to richer content and increased system reliability and fault tolerance.



It is with this future of embedded intelligence in mind that we announced today the new ARM Cortex-M7 processor, bringing a host of new features and capabilities to the Cortex-M family of low-power, 32-bit processors. Cortex-M7 introduces a number of micro-architectural features which enable our partners to build chips that can reach much higher levels of performance than existing microcontroller cores in terms of general-purpose code, DSP code and floating point code.

Cortex-M7_Diagrams_V2(3)-03-03 (1).jpg

Three lead licensees: Atmel, Freescale and STMicroelectronics have been working with ARM since the very early stage of development on the Cortex-M7 processor – they will be bringing exciting new products to market over the coming months. The ARM Cortex-M7 processor is targeted at demanding embedded applications used in next generation vehicles, connected devices, and smart homes and factories Through these products, the benefits delivered by the Cortex-M7 processor will be apparent to users in our increasingly connected world.

Cortex-M7 summary.PNG

For example domestic appliances (or white goods as they are referred to) would have previously had a simple user interface and be controlled by simple processors. But the next generation devices are getting smarter in order to operate more efficiently using minimal energy and resources. Next generation products are moving to more sophisticated displays, advanced touch screen panels, advanced control motors to include field oriented control algorithms in their motor driver control in order to operate more efficiently.  Some of these also need to run communications software stacks to interface with other appliances and interface with the outside world to provide billing information, power usage and maintenance information.

WhiteGoods cortex-M7.PNG

All of these requirements demand more performance from the microcontroller, which lies at the heart of the appliance – Cortex-M7 based MCUs will deliver that performance. In addition to excellent performance, not only does the Cortex-M7 processor extend the low power DNA inherent in the Cortex-M family but it also provides the same C-friendly programmer's model and is binary compatible with existing Cortex-M processors. Ecosystem and software compatibility enables simple migration from any existing Cortex-M core to the new Cortex-M7 core. System designers can therefore take advantage of extensive code reuse which in turn offers lower development and maintenance costs. You can find more information on Cortex-M7 on arm.com.


ARM TechCon - the largest meeting of the ARM Partnership - is taking place in Santa Clara in just a few days. Dr Ian Johnson, Product Manager for the Cortex-M7, will talk in greater depth about the the features of this new processor in “The Future Direction of the ARM Cortex-M Processor Family” session (2pm-3.50pm, October 1st) along with invited speakers from lead licensees and additional guests. Free ARM Expo passes are available with ARMExp100 code.

But why wait, you can start discussing Cortex-M7 processors with embedded experts here today!


Related content and discussions also on:




Cortex-M7 Launches,you can read a detailed introduction from AnandTech.

AnandTech | Cortex-M7 Launches: Embedded, IoT and Wearables

And you can also find the information from ARM official website:

Cortex-M7 Processor - ARM

Yesterday we released version 3.10.0 of Valgrind, a GPL'd framework for building simulation-based debugging and profiling tools.  3.10.0 is the first official release to support 64-bit ARMv8.  The port is available from http://www.valgrind.org, and the release notes are available at http://www.valgrind.org/docs/manual/dist.news.html.


Porting the framework to the 64-bit ARM instruction set has been relatively straightforward.  The main challenge has been the large number of SIMD instructions, with some instructions involving significant arithmetical complexity: saturation, rounding, doubling and lane-width changes.  On the whole, the 64-bit instruction set is easier to simulate efficiently than the 32-bit ARMv7 instruction set, as it lacks dynamically conditionalised instructions (a la Thumb) and partial condition code updates, both of which hinder fast simulation.  As the port matures I expect it to attain performance comparable with other Valgrind-supported architectures.


Porting the tools based on the framework was almost no effort, because the framework is specifically designed to insulate tools from the details of underlying instruction sets.  Currently the following tools work well enough for serious use: Memcheck (memory checking), Helgrind, DRD (thread checking), Cachegrind and Massif (time and space profiling).


Initial development was done using cross-compilation and running on the ARM Foundation model, which proved to be a reliable starting point.  Further development was done on an ARM Juno board running a Fedora snapshot.  The Juno board made a big difference, as it facilitated building Valgrind "natively" and can build and run regression tests in a reasonable time frame.


We look forward to feedback from developers using the port to debug/profile serious workloads, on the order of millions to tens of millions of lines of C++.

Embedded processors are frequently compared through the results of Power, Performance and Area (PPA) implementation analysis. Jatin Mistry and I have created a whitepaper that describes the specific details of the PPA analyses performed on the Cortex-R Series processors.


Often high-level figures are quoted for processors, for example http://www.arm.com/products/processors/cortex-r/cortex-r5.php under the "Performance" tab, shows top level details of the Cortex-R5 in a mainstream low power process technology (40nm LP) with high-density, standard-performance cell libraries and 32KB instruction cache and 32KB data cache - this shows the total area as 0.45mm2.

However, behind the top-level power, performance and area results there are many variables and details that can dramatically alter these figures. Different implementations target different configurations, for example the cache sizes or inclusion of the Floating Point Unit (FPU), and target different goals, for example aiming to achieve the highest possible frequency or the lowest possible area. The process and libraries used have a dramatic affect. The attached whitepaper describes the process we use to perform a PPA analysis for the Cortex-R Series processors.


The goal of the whitepaper is to describe, for those without really deep processor implementation knowledge, the many variables that should be understood to get real value from any PPA data presented to enable an estimation of the real PPA of your own proposed processor implementation and also to make fair comparisons between processors, both from a single IP partner or between processors from different processor IP vendors.


Any PPA data without understanding the details behind it is of very little value. We hope that you find it informative.

What is the connection between rugby football, interconnect and performance analysis kits?


There is a seemingly never-ending march towards smaller, cheaper and more efficiency in complex chip design, and every component of the modern SoC is being squeezed for more with each new design. There is a case of diminishing returns when seeking improvements and designers need to be creative in order to find new ways to eke out those extra bits of performance that ultimately make the difference across the entire chip. The World Cup-winning rugby coach Sir Clive Woodward famously stated that excellence was best achieved by improving 100 factors by 1%, and this theory certainly holds true for a lot of the SoC’s that are being designed these days. Staying on the theme of rugby for the moment, the interconnect is like a scrum half (or a quarterback for those of you who live east of the Atlantic!) as it acts as the go-between for each component and marshals them effectively to make the chip greater than the sum of its parts. A scrum half’s performance is measured by the speed and efficiency with which he passes the ball to his teammates, thus enabling them to do their job more effectively, similarly to how you would want your system interconnect to function.

Scrum half.jpg

This role increases in importance as massive growth in system integration places on-chip communication at the centre of system performance. The ARM CoreLink NIC-400 is a very powerful and highly configurable interconnect with many high-end features to manage traffic passing through it. It is in fact so configurable that it is regularly one of the most popular IP models created and downloaded on Carbon Design Systems’ IP exchange portal for virtual prototyping (found here). This configurability allows a single user to create dozens of models for the system interconnect, and reflects the importance that users place on having accurate models for the components in their system that have a great influence on overall performance. With so many parameters in play the ability to test the interconnect within the system prior to tapeout is clearly of great value. Just setting all parameters to max performance is rarely a sensible option as power and cost budgets demand that less silicon is used to achieve the same levels of performance the full system modelling allows refinement to save silicon are, reduce the number of wires without compromising performance goals.


While the configurability of the interconnect is an inherent and indeed crucial part of its effectiveness, the vast amount of choices available also means that users often do not fully optimise the interconnect to their individual system. This is where virtual prototyping tools come into the equation, and help designers to avoid arbitration problems, detect system bottlenecks and give a better picture of how to manage PPA requirements. This ability to foresee and avoid potential issues before they become a problem is invaluable in an age where the pressure to get designs right first-time and on time is a concern of every system architect. Additionally, the depth of analysis that the Carbon tool can undertake provides fast and meaningful feedback that can help you measurably improve your design. Last year I co-wrote a white paper on this subject with Bill Neifert, titled “Getting the most out of the ARM CoreLink NIC-400”, which is available to download.

In the example shown here, a simple NIC-400 is configured with two masters and two slaves. The masters are set up to mimic the data loads from a CPU and DMA controller and the dummy targets are an Ethernet MAC and a DDR3 memory controller. Of course, since the traffic generators are quite configurable, it’s possible to model any number of different sources or targets and we’ll get more into that in a bit. Note though that we’re analysing traffic on any of the connections. The graphs shown here track the latency on the CPU interface and the queues in the DDR controller. The exact metrics for the system in question will of course vary based upon design targets however. It’s also beneficial to correlate data across multiple analysis windows and indeed even across multiple runs.


The important thing we’ve done here is establish a framework to begin gathering quantitative data on the performance of the NIC-400 so we can track how well it meets the requirements. The results can be analysed which will likely lead to reconfiguration, recompilation and re-simulation. It’s not unheard of to iterate through hundreds of various design possibilities with only slight changes in parameters. It’s important to vary the traffic parameters as well as the NIC parameters however since the true performance metric of the NIC-400 and really, all interconnect IP, is how it impacts the behavioural characteristics of the entire system.


I will be going into more detail on all of this on Thursday at 18:00 BST (1:00 pm EDT, 10:00 am PDT) in a webinar titled “Pre-silicon optimisation of system designs using the ARM CoreLink NIC-400 Interconnect” with Eric Sondhi, a corporate applications engineer at Carbon Design Systems. You can register for the webinar here, and make sure to attend live to ensure that your questions are answered immediately.

The ARM® Cortex®-R family is perhaps the unsung hero of the ARM powered world, quietly running infrastructure from Hard Disk Drive and Solid State Drive controllers, through to mobile phone baseband processing and even automotive ABS controllers. While not having the all-out performance of the Cortex-A series application processors, the Cortex-R family of processors provide several key benefits for systems requiring hard, real-time performance.


The main differences between application processors and real-time processors are:

  • Deterministic timing - A system is said to be real-time if the total correctness of an operation depends not only on its logical correctness, but also on the time in which it is performed.
  • Latency - There are time constraints to respond to external events. A car braking system must consistently respond within a certain time. The ARM Real-time (R) profile defines an architecture aimed at systems that require deterministic timing and low interrupt latency.
  • Safety and reliability - For embedded applications requiring high performance combined with high reliability, Cortex-R series processors provide features such as soft and hard error management, redundant dual-core systems using two cores in lock-step, and Error Correcting Codes (ECC) on all external buses.


The new ARM Cortex-R Series Programmer’s Guide extends the software development series of programming guides available from ARM by covering Cortex-R series processors conforming to the ARMv7-R architecture.


The Cortex-R Series Programmer’s Guide describes the following areas which differ between the Cortex-R series and the Cortex-A and Cortex-M series:

  • Floating-point support is available as an option on most Cortex-R series processors to provide computation functionality compliant with the IEEE 754 standard.
  • Unlike most other ARM processors, Cortex-R processors typically have some memory that is tightly coupled to the processor core to minimize access time and guarantee latency for critical routines.
  • The Cortex-R processors use an MPU instead of an MMU. The MPU enables you to partition memory into regions and set individual protection attributes for each region.
  • Fast and consistent interrupt response is a key feature of the Cortex-R processors.
  • Fault detection and control can be provided by lock-step processors, ECC on buses and memory, and watchdog timers.


This guide is aimed at anyone writing software for the Cortex-R family of processors, and complements, rather than replaces the existing documentation for the Cortex-R family.

If you’re new to using Cortex-R processors and looking to understand where to begin writing bare-metal programs, or you’re an experienced applications designer wanting to understand how to make the most of the underlying processor, then this guide is a good introduction to the Cortex-R family.


The document is only available to registered ARM customers. See, Cortex-R Series Programmer's Guide.

Current specifications for Rayeager PX2 enhanced board:

SoC – Rockchip PX2 Dual-core ARM Cortex-A9, up to 1.4GHz Mali-400MP4 Quad-core GPU, up to 400MHz

System Memory – 2GB/1GB DDR3

Storage – 8GB eMMC flash + micro SD slot

Video I/O

HDMI 1080P

VGA 1080P

LCD (selectable)

Audio Output / Input – HDMI, optical S/PDIF, headphone, and built-in MIC

Connectivity – Gigabit Ethernet, dual band 802.11 b/g/n Wi-Fi with external antenna, and Bluetooth

USB – 3x USB 2.0 host ports, 1x micro USB OTG

Expansion Headers –YCBCR_IN x1,CVBS_IN x1,Keys x5,Gsensor x1,Compass x1,RTC x1 , UART to USB debug port x1.

Power Supply –DC5V @ 2.0A with HDD support Li-battery / PMIC TPS659102

Dimensions – 150 x 97 mm


Rayeager PX2 enhanced Development Board 100% open source hardware,include the hardware schematics,component’s placement,and components’datasheet.

Rayeager PX2 enhanced Development Board supports Android 4.4.2 and Ubuntu,and the SDKs,tutorial and hardware files will all be available from the ChipSpark.com.

Os processadores e microcontroladores construídos com a arquitetura ARM são identificados conforme a versão da arquitetura adotada, o perfil e suas variantes.

Até o momento já foram definidas 7 versões de arquitetura ARM, sendo atualmente em uso apenas 4, identificadas pelo Prefixo ARMv, sendo elas ARMv4, ARMv5, ARMv6 e ARMv7.

Considerando a mais atual a ARMv7, temos 3 perfis de uso definidos, ARMv7-A, ARMv7-R e ARMv7-M sendo respectivamente usadas para, processadores de aplicação geral, processadores e microcontroladores para aplicações de uso critico e resposta em tempo real, e finalmente o perfil para uso em microcontroladores de uso geral.


As variantes são identificadas por letras adicionados as versões no momento existem as seguintes:

  • ARMv4,
    uma variante que inclui apenas o conjunto padrão de instruções ARM.
  • ARMv4T,
    nessa variante é adicionado o conjunto de instruções Thumb.
  • ARMv5T 

    melhorias em relação a interworking e instruções ARM. adicionado "Count Leading Zeros" (CLZ) e instruções para "Software Breakpoint"(BKPT).

  • ARMv5TE

    Melhorias no suporte aritmético relativo a algoritmos de processamento de sinal (DSP) , adicionado "Preload Data" (PLD), "Load Register Dual" (LDRD), Store Register Dual (STRD), e adicionado instruções para transferencias de 64-bits para registradores de coprocessador (MCRR, MRRC).

  • ARMv5TEJ,
    Adicionado a instrução BXJ e outros suportes para extensão arquitetural Jazelle®.
  • ARMv6,
    Adicionado novas instruções para o conjunto padrão ARM, formalizado e revisado o modelo de memória, e a arquitetura de Depuração.
  • ARMv6K,
    Adicionado instruções para suporte a multiprocessamento ao conjunto padrão de instruções e alguns recursos extras para o modelo de memória.
  • ARMv6T2,
    Introduz a tecnologia Thumb-2, que dá suporte a um maior desenvolvimento de instruções fornecendo um nível de funcionalidade similar ao conjunto de instruções padrão ARM.

Há também as extensões que são opcionais que podem ser adicionadas conforme o fabricante, as extensões são dividas em grupos, algumas delas estão listadas abaixo:

  • Extensões relativas ao conjunto de Instruções
    • Jazelle, é uma extensão que dá poder a variante arquitetural ARMv5TE como ARMv5TEJ.
    • Extensão para Virtualização.
    • ThumbEE é uma extensão que fornece um conjunto de instruções ampliado do conjunto Thumb padrão e que permite código dinamicamente gerado, sendo obrigatório no perfil ARMv7-A e é opcional no perfil ARMv7-R, para a versão arquitetural ARMv7.
    • Extensões de ponto flutuante é uma extensão para comprocessador de ponto flutuante. Esta extensão é historicamente chamada de Extensão VFP.
    • Advanced SIMD, é uma extensão do conjunto de instruções que adiciona instruções do tipo "Simgle Instruction Multiple Data" (SIND), para operação com vetores com os tipos de dados Inteiros e ponto flutuante de precisão simples, sobre registradores doubleword e quadword.
  • Extensões arquiteturais
    • Extensões de segurança.
    • Extensões para Multiprocessamento.
    • Extensões para Endereçamento Físico de Maior Largura.
    • Extensões para Virtualização.

Este resumo foi proposto para a Wikipedia por mim no link: Arquitetura ARM – Wikipédia, a enciclopédia livre

Para habilitar ou desabilitar uma interrupção em um cortex-m0, há dois registradores, este método é a melhor forma para evitar "race conditions" seja em um ambiente multitask ou não, além de reduzir o número de instruções assembly Para gerar uma interrupção via software é adotado também o mesmo procedimento.


Quando se usa multitask, em um microcontrolador, o que não é muito comum em microcontroladores de 8-bit, você precisa fazer uso de certos procedimentos para evitar problemas.


Em um ambiente multitarefas, duas ou mais tarefas ou mesmo quando apenas uma interrupção interfere no registrado além do processo principal, podem interferir um único registrador, interferindo em seus bits para habilitar ou desabilitar a interrupção, ou mesmo para simular uma interrupção externa via seu código. para evitar a ocorrência de "race conditions" ou seja a disputa pelo uso do registrador, usando poucos passos, os microcontroladores Cortex-M, usam dois registradores para o mesmo recursos, são dois para habilitar/desabilitar respectivamente e dois para colocar a interrupção em pending_mode, ou remover esta condição.


Veja, colocando uma interrupção em estado pendente (Pending Mode) é como provar o lançamento de tal interrupção, simulando a ocorrência externa em sua origem. Porém você pode também remover esta ocorrência, limpando esta estado antes que ele seja processado.


Há dois registradores para habilitar/desabilitar uma interrupção, e são chamados setena e clrena, respectivamente "Set Enable Interrupt" e "Clear Enable Interrupt", estes registradores são membros da coleção de registradores existentes no nvic (Nested Vectore Interrupt Controller), NVIC é um recurso externo ao núcleo do processador que gerencia as interrupções e exceções. Na figura abaixo, retirada do livro de Joseph Yiu, [1], é apresentado o mapeamento de memória onde se consegue acesso aos registradores do NVIC, permitindo assim sua parametrização. Tais registradores se encontram entre o endereço 0xE0000000 a 0xFFFFFFFF, tal faixa é chamada de Espaço de Controle do Sistema (System Control Space scs) que se resume a faixa 0xE000E000 até 0xE000EFFF, que por sua vez está dentro do Barramento Interno de Periféricos (Private Peripheral Bus ppb).

Captura de tela 2014-08-21 00.18.34.png

O pacote CMSIS oferece um amplo suporte através de funções e macros para gerir tais registradores, mas iremos focar na codificação em C e Assembly para compreendermos os benefícios arquiteturais nos dado pelo ARM


O registrador SETENA, comentado acima,  é acessado  através do endereço 0xE000E100, este endereço permite leitura e escrita, quando o processador inicializa após um reset seu valor é 0x00000000, cada bit é representação do estado de uma interrupção, o bit 0 é a interrupção de número 0 (#0) ou seja a exception de número 16 (#16), o bit 2 é a interrupção de número 2 (#2), ou seja a exception #18, e assim por diante.


O segundo registrador que faz par com este é usado para limpar os estados definidos por este é o registrador CLRENA e é acessado pelo endereço 0xE000E180.


Estes dois registradores portanto são usados para habilitar e desabilitar, havendo outros registrado, como citado para representar a ocorrência da interrupção externamente, e que podem ser usado para simular por software tal ocorrência, este dois registradores são setpend acessado pelo endereço 0xE000E200  que define haver uma interrupção pendente, e clrpend que acessado pelo endereço de mémoria 0xE000280. Iremos ver mais detalhes mais a frente.


Como já falamos o registrador SETENA tem como função habilitar a ocorrência de interrupções, para isso basta definir como 1 o bit correspondente a interrupção que se deseja habilitar, porém nada acontece quando se define o respectivo bit como zero, ou seja limpa o bit, já que este registrador apenas é para habilitar a interrupção e/ou saber se ela está habilitada.


para desabilitar uma interrupção é preciso usar o registrador que faz par com o SETENA, que tem nome de CLRPEND, uma fato interessante é que este registrador não é oposto ao SETENA, ele apenas tem função oposta, para se saber qual interrupção está desabilitada é necessário consultar no registrador SETENA o respectivo BIT se ele está 0. Para desativar uma interrupção basta escrever o respectivo bit com o valor 1, não tendo efeito algum escrever o valor 0 neste registrador.

Os outros dois registradores que tem função identificar uma interrupção pendente, identificados como SETPEND e CLRPEND tem função similar aos registradores SETENA e CLRPEND, mas sua função é informar que há interrupções pendentes para serem tratadas, portanto ao ler o registrador SETPEND você irá saber que há uma determinada interrupção para ser tratada conforme o bit que está ativo, a ordem dos bits é a mesma usada em SETENA, porém há a possibilidade como já dito de se simular que uma interrupção ocorreu, bastando escrever 1 no respectivo bit, logo que isso for feito a interrupção será lançada, e poderá ser tratada pelo respectivo handler/vetor. Porém suponha que esteja dentro de outra interrupção e que ao manipular algum periférico alguma interrupção pode ser lançada acidentalmente por este periférico e se deseja retirar o estado de pendência dela, basta portanto escrever 1 no bit correspondente desta interrupção no registrador CLRPEND, assim a pendência para esta interrupção deixa de existir, como nos pares anteriores escrever 0 em ambos registradores não tem efeito.


Concorrência pelos registradores (Race Condicion)


Sobre o problema de concorrência de registradores, muito comum em sistemas multitarefa, a Arquitetura ARM adota esta prática de dois registradores com funções inversas exatamente para evitar a necessidade de  leitura previa do registrador para depois efetuar a mudança de estado, assim não há problemas de concorrência e perda de estados.


Além deste problema de concorrência, onde dois processos podem intervir no mesmo registrador e um perder a alteração feita pelo outro, temos também o numero de passos necessários para efetuarmos tal mudança, já que com esta abordagem não precisamos consultar o estado atual do registrador para regrava-lo, basta mudar o bit desejado e não há perda do outro estado, já que a escrita do valor 0 é ignorada, ou seja não se muda o estado oposto usando o mesmo registrador.


Veja o código abaixo em C, ao se escrever o valor 0x4 (B00000100) no registrador SETENA.

*((volatile unsigned long *) (0xE000E100)) = 0x4; // Disable interrupt #2


Tal escrita apenas interfere nos bits que são setados com o valor 1, sendo ignorado os bits que são de valor 0, com esta estratégia evita-se a necessidade de leitura do registrador para se fazer a equiparação dos bits e definir o desejado. Veja abaixo como fica tal código em Assembly


LDR    R0, =0xE000E100    ; armazena o endereço do registrador SETENA em R0
MOVS   R1, #0x04          ; move o valor 0x4 para R1, equivalente em binário B00000100, 
                          ; bit 2, é a interrupt #2 (Exception #18)
STR    R1, [R0]           ; Escreve o conteúdo de r1 no endereço armazenado em r0.

Observe que somente três instruções são usadas em assembly para ativar uma determinada interrupção sem interferir no estado das demais.

Usamos os seguintes comandos: LDR, MOVS STR
e os Registradores R0 e R1

Como pode ver, não é preciso ler o registrador antes de altera-lo, uma vez que ele somente considera a escrita do valor 1, portanto ao escrever o valor 0 ele não considera, assim você não consegue eliminar acidentalmente alterações realizadas por outros processos.


Vejamos por questões didáticas a abordagem convencional. OU seja um registrador para habilitar/desabilitar uma interrupção, estamos usando aqui o mesmo endereço, mas isso não representa a realidade.


*((volatile unsigned long *) (0xE000E180)) = *((volatile unsigned long *) (0xE000E180)) | 0x4; // Desabilita a interrupt #2


Como pode ver, no procedimento tradicional, você primeiro lê o registrador, altera o valor obtido e grava novamente no mesmo registrador, porém o que aconteceria se um segundo processo toma-se a execução neste instante? e alterasse o registrador também? você teria portanto um valor inválido e perderia em seguida a alteração realizada pelo segundo. Veja o mesmo em assembly abaixo, observe como se gasta mais instruções e assim aumenta a possibilidade de concorrência.


MOVS     R2, #0X04        ; Mascara de bytes, somente o bit 2 é habilitado
LDR      R0, = 0XE000E100 ; registra o endereço de SETENA no registrador R0
LDR      R1, [R0]         ; Obtém o estado atual do registrador
ORRS     R1, R1, R2       ; altera o valor obtido com o novo valor do bit 2
STR      R1, [R0]         ; Devolve o valor para o registrador

Como pode ser observado você irá gastar duas instruções a mais para ativar uma interrupção, além disso entre a execução da instrução 03 (ORRS) e 05 (STR) é possível haver alteração no valor do registrador SETENA, sendo o valor armazenado em R1 inválido.

usamos neste exemplo as instruções MOVS, LDR, ORRS e STR e os registradores R1, R2 e R0


O mesma situação pode ocorrer com o par de  registradores SETPEND e CLRPEND, acarretando situações imprevisíveis e comportamentos indesejados, como perda de sincronismo entre sequências de interrupções.

Este post se refere a anotações que tenho feito relativo aos meus estudos da arquitetura Cortex-M em especial Cortex-M0, e poderão sofrer alterações e melhoras no decorrer de meus estudos.



[1] - The Definitive Guide to the ARM Cortex-M0, Joseph Yiu


Chinese Information  中文信息:参与ARM技术培训的新途径

Just a short update to highlight an exciting new development. In response to demand, ARM has launched a limited program of public open-enrollment training courses. We are hosting these at our major regional support centres in San Jose, Cambridge and Shanghai. The program, as I say, is limited at present but touches several of our most popular courses, including Cortex-M System Design, TrustZone and ARMv8 Software Development.


You can check out the full schedule here: ARM Training Courses - ARM


If you have any questions, please don't hesitate to contact the ARM training team: Contact Support - ARM



A good paper about Cortex M from AnandTech, you can read it by the link AnandTech | ARM's Cortex M: Even Smaller and Lower Power CPU Cores

A study recently carried out by Cambridge University found that the global cost of software debugging has risen to the princely sum of $312 billion every year, and that developers spend an average of 50% of their programming time finding and fixing bugs (read the full story here). Divide that massive sum by 7.1 billion people on the planet and it works out at $44 per person. Put another way, it’s enough to buy everyone in the world a Raspberry Pi!

Furthermore, the trend for increasing complexity in SoC design (see graph below) means that this problem will only take up more resources in terms of time and money going forward. It is an issue that has given SoC architects and system developers’ headaches for years.

ITRS 2007 SoC Consumer Portable Design Complexity Trends

With that said, a well-thought out debug and trace solution for your SoC can help manage the increased complexity by providing the right hardware visibility and hooks. Software developers can make use of this key functionality to develop optimized software in a timely manner with reduced risk of bugs. Each of the following 4 key use-cases (see picture below) can be addressed for your SoC design with a customized debug and trace solution that allows for:

  • Faster SoC bring-up
  • Easy and quick software debug
  • In-field analysis and postmortem debug
  • System optimization via profiling, event synchronization



ARM CoreSight SoC product is designed to offer a comprehensive solution that can be tailored to meet specific requirements. The CoreSight SoC-400 allows you to:

  • Design for large systems with multiple cores through use of configurable components
  • Maximize debug visibility using a combination of debug components
  • Use IPXACT descriptors for all components to automate stitching and for testbench generation
  • Support different trace bandwidth requirements for complex SoCs
  • Accelerate design and verification through example subsystems, testbenches, test cases and necessary verification IP components
  • Support multiple hardware-debug models for multiple use cases


When all of this is put together in a wider context, ARM CoreSight IP gives design teams a real advantage through its innovative debug logic that reduces design development and software debug cycles significantly. Furthermore, if we think of debug as solving a murder through the use of backward reasoning, then trace is the video surveillance that pinpoints the culprit. Trace is invaluable as it provides real-time visibility into errors, dramatically cutting down design cycles and iterations.

I recently conducted a webinar on how to build an effective and customized debug and trace solution for a multi-core SoC. Register here for free to access the webinar recording.

There is a corresponding White Paper that goes in to a lot more detail on the ARM Debug and Trace IP page.

The White Paper provides the following:

  • High-level steps on building a debug and trace solution
  • Recommended design and verification flow
  • Advantages of using SoC-400 at each stage of your development process
  • Pointers to further information and useful references

Dwight Eisenhower may not have lived until the age of semiconductors, but his quote of “No battle was ever won according to plan, but no battle was ever won without one” rings true in the context of debug subsystem design. Understanding debug and trace hardware features and capabilities is key to building a solution to meet YOUR specific requirements. The paper discussed some of the key design decisions faced by architects.

Stay tuned for more upcoming exciting news about ARM CoreSight IP or sign up for ARM TechCon 2014 to see it for yourself! TechCon will be the first time that members of the public will be able to demo the new design environment for building debug and trace subsystems. This makes it even easier to configure and integrate ARM CoreSight IP within a large system, and will help users cut down on that $312 billion global debug bill. If you have any questions or comments about ARM CoreSight IP or this blog, please write them below and I will get back to you as soon as possible.

I have followed some tutorials on the internet and found one in particular quite interesting and didactic for those just starting to program ARM Bare metal. The Blog é Freedom Embedded | Balau's technical blog on open hardware, free software and security.


Below is a summary of needed to succeed in building a Hello World commands, let noted here that in the near future I may supplement this information and synthesize into a more detailed tutorial.


Based on the link: Hello world for bare metal ARM using QEMU | Freedom Embedded


Compile the code with the following commands:

$ arm-none-eabi-as -mcpu=ARM926EJ-s -g startup.s -o startup.o
$ arm-none-eabi-gcc-c  -mcpu=ARM926EJ-S test.c -g -o test.o
$ arm-none-eabi-ld-T test.ld test.o startup.o -o test.elf
$ arm-none-eabi-objcopy -O binary test.elf test.bin


And execute with the following command:

qemu-system-arm -M versatilepb -m 128M -s -nographic S -kernel test.bin


Debug with GDB, with the following comand:



Where you get the prompt from GDB, type:

target remote localhost: 1234
file test.elf


when finished working with qemu if you have problems with the terminal, use the command:


stty sane


to fix it.

Filter Blog

By date:
By tag: