1 2 3 Previous Next

ARM Processors

199 posts

An interesting webinar on "Building a Customized Debug and Trace Solution for a Multi-Core SoC" is being organised on July 24, 2014, 2:00 PM EDT, where you can learn how to:


  • Design and verify configurable debug and trace solution for your complex SoC.
  • Use ARM DS-5 tool to maximize the utility of advanced debug and trace hardware.
  • Use CoreSight technology to help identify areas/functionality of your design that can be optimized both in hardware and software throughout the development process.


                                                           You can register for the event at:

                                   Building a Customized Debug and Trace Solution for a Multi-Core SoC.


Hope to see you there!

Having started at ARM about a week ago, I have been very excited to work on Juno, ARM's first 64-bit Development Platform.

Juno is aimed at accelerating the development of 64-bit software for the mobile and enterprise markets. Based on the ARMv8-A big.LITTLE™ architecture, Juno offers software developers an open, vendor neutral, reference platform for 64-bit software development. Interestingly, Linaro has also released an Android Open Source Project (AOSP) port for the 64-bit ARMv8-A architecture. The entire software stack for the platform is now available through Linaro. The software bundle includes ARM Trusted Firmware, Linux file systems and a Linux kernel (3.10) that can support both OpenEmbedded Linux and Android (AOSP) file systems.

Juno is aligned with the ARM® Server Base Architecture (SBSA), which means that developers can use the system for early access of porting and performance-tuning OS kernel or driver code to the 64-bit architecture, based on the ARM® Cortex®-A57 and ARM Cortex-A53 processors. This means Juno creates a common foundation for the software developers in the ARMv8 ecosystem and allows early access to the silicon. For the hardware vendors, Juno provides the entire ARMv8 IP portfolio working together in a big.LITTLE™ implementation, delivering high performance and low power consumption.


The timing of the platform release is important, as Google recently announced the details of the their "Android L", which would include a new Virtual Machine "ART" to support both 32 and 64-bit ARM architectures. The platform definitely aims to accelerate the time to market for future 64-bit ARM designs.

Hardware Overview


Juno Design.jpg


The Juno SoC was built by ARM with an aim to keep the implementation functionally correct so that it can be representative of a potential mobile platform. Therefore, like many development platforms, Juno isn't meant to deliver highest standards of performance or efficiency. It is meant

to enable software developers to port their applications to 64-bit ARMv8-A. Juno includes ARM's own Mali™-T624 graphics processor, the mobile-focused CCI-400, dual DDR3 memory controllers, and an ARM® Cortex® M3 System Control Processor that handles SoC-level power management and system control. Juno also offers SoC hardware expansion for customers wishing to implement their own custom RTL designs. This is enabled through the LogicTile design and Thin Links technology, which can be used to prototype custom CPU, GPU and peripheral designs. Lets have a quick look at Juno hardware specs:



ARM® Cortex®-A57 MP2 cluster

    Speed: 1.1GHz (Overdrive)

    Caches: L1 48KB I, 32KB D, L2 2MB


ARM® Cortex®-A53 MP4 cluster

    Speed: 850MHz (Overdrive)

    Caches: L1 32KB, L2 1MB

GPUsARM® Mali™-T624 MALI T624

   Speed: 600MHz

   Caches: L2 128KB


  8GB 1600MHz DDR,

  64MB user NOR FLASH,


SoCARM® CoreSight ETM/CTI per core,

VFS and power gating,

4 energy meters,

DMC-400 dual channel DDR3L interface,

Internal CCI-400, 128-bit, 533MHz

Rest of SoC

Internal NIC-400, 64-bit, 400MHz

DMAC : PL330, 128-bit

Static Memory Bus Interface : PL354

HDCLD dual video controllers: 1080p

Expansion support

AXI expansion to FPGA daughterboard

USB 2.0 with 4 port hub

Debug & Trace

ARM® JTAG : 20-way DIL box header

ARM® 32/16 bit parallel trace


Software Overview

ARM has been working closely with Linaro to provide a stable software stack that includes low-level firmware for runtime services and high-level file systems such as OpenEmbedded Linux and Android (AOSP).

Linaro ARMv8 ports are based on Linux kernel 3.10 (Linaro Stable Kernel), and compiled with GCC 4.9 and can run both Juno and ARMv8 fast models. The entire software stack for Juno is available through the Linaro website.


linaro website.png


Let's drill-down into the different software components:



MCC Microcontroller Firmware

The MCC firmware takes care of early setup before the SCP or application processors are powered on. The MCC is also responsible for managing firmware upgrades.

Systems Control Processor (SCP) Firmware

The SCP firmware loads the runtime that provides low level power management and system control for the Juno platform.
ARM Trusted FirmwareThe ARM Trusted Firmware provides an open source framework enabling easy integration of secure OS and run-time services to ARMv8-A platforms. The project has been hosted as an open-source project on GitHub
Unified Extensible Firmware Interface (UEFI)The Juno UEFI implementation provides Linux loader support for the Juno platform. It is based on the open source EFI Development Kit 2 (EDK2) implementation from the Tianocore sourceforge project.
A 64-bit Linux Kernel with big.LITTLE™ and Mali supportThe Linux Kernel can support both Android and OpenEmbedded file systems. It is being managed by Linaro and can be downloaded from the Linaro website as a part of the entire software stack.
Linux-based file systemsThis includes Android Open Source Project (AOSP) and OpenEmbedded Linux. Linaro has been responsible for porting AOSP to 64-bit ARM and will be maintaining this project. It can be downloaded from the Linaro website along with other software deliverables.

Clearly, Juno offers software developers and system architects a robust product stack including cutting-edge hardware and software, which would serve as a firm baseline for testing future 64-bit ARM-based designs. It has received a lot of attention in the press. A simple Google News search on "Juno ARM 64-bit" would yield a number of blogs and news articles on the Juno platform, including details about the software stack provided by Linaro. Some of the top links are listed below.


ARM Unveils Juno Platform for 64-Bit Android Development

ARM arms devs for 64-bit Android push with 'Juno' board • The Register

ARM releases Juno dev platform for 64-bit computing - Software - News - HEXUS.net




For support and queries around the Juno platform, contact:  juno-support@arm.com

Getting Started

In order to get started with the Juno board, a detailed getting-started guide can be found here.

To follow updates on Juno, follow Juno ARM Development Platform


Additional Links

Juno ARM Development Platform Technical Reference manual

Juno ARM Development Platform datasheet

Juno ARM Development Platform Technical Overview


Just wanted to highlight this thought provoking whitepaper from Trustonic on future of payment and how several pieces of the puzzle (such as FIDO, beacons, HCE and TrustZone based TEE) fit nicely together...



What do you think? 

Juno was announced yesterday along with Android (AOSP) support via Linaro.  It is available to order from ARM or distributors in limited quantities (SAP code V2M-Juno-0317B).   Details on the platform are available here:  www.arm.com/juno

Juno was designed to accelerate the porting and optimization of code for ARMv8-A platforms, it is suitable for companies that want early access to advanced hardware.


Normal World Software available now


Less well known is that the Juno board is a focus for our ARM Trusted Firmware OSS project so that Trusted OS developers can port easily to ARMv8-A  on a generic platform.   You can find out  more about ARM Trusted Firmware here and download the source  code:



Kicking off this week for the world of supercomputing is the 2014 International Supercomputing Conference in Leipzig, Germany. One of the major supercomputing conferences, ISC is Europe’s largest supercomputing conference and as one would expect, an important show for companies vested in high performance computing (HPC) and other aspects of supercomputing. We’ll see a few announcements out of ISC this week, and starting things off will be NVIDIA.

NVIDIA will be taking to the ISC show floor to announce that their Tesla products will be adding ARM64 host compatibility, enabling them to be used in ARM64 systems. NVIDIA has been a supporter of the ARM ecosystem for some time through the use ARM cores in their Tegra SoCs and by enabling CUDA on ARM processors. Adding 64bit ARMv8 (ARM64) support then is a logical extension of this by bringing their hardware and toolkit forward to the new generation of 64bit ARM processors.

However while NVIDIA’s previous ARM works have been focused on consumer uses, today’s Tesla ARM64 announcement is focused on the professional computing side and hence the use of ISC as a backdrop for this announcement. With today’s announcement NVIDIA is expanding their Tesla and HPC efforts into the ARM ecosystem, intending to bootstrap and support the growing use of ARM CPUs as the core processors in HPC setups. ARM CPUs have already made some headway into the micro server space for tasks that require many low performance threads, however it’s not until ARMv8 that ARM processors have gained the ability to address enough memory and have gained enough in performance to be useful in HPC applications. With the increased capabilities of ARM64 processors, HPC system builders can now design systems around ARM, with NVIDIA taking up their now well-defined position as a GPU supplier to provide their highly parallel processors to complete these systems.


All things considered NVIDIA is not necessarily introducing new functionality or new performance, but the addition of ARM64 support means that NVIDIA is hedging their bets in the server space. The company already supports Tesla products connected to x86 servers in traditional HPC setups, will offer deeper Tesla support on POWER platforms through their forthcoming NVLink interconnect, and now the company is covering the other end of the spectrum by offering Tesla support for ARM64 platforms. So far the ARM architecture has yet to prove itself in the HPC market beyond some very specific micro server roles, but with NVIDIA’s continued success in the HPC market and the potential for ARM to disrupt the traditional x86 market, it’s not surprising to see NVIDIA hedging their bets just in case that disruption occurs. No matter what happens – x86 holds, POWER takes off, or ARM disrupts – NVIDIA intends have the market covered.

To that end, along with today’s announcement of ARM64 compatibility NVIDIA is also announcing the first Tesla ARM64 development platforms. In July, Cirrascale will be shipping their RM1905D 1U development platform, which contains a pair of Applied Micro X-Gene CPUs along with a pair of Tesla K20 accelerator cards. Meanwhile E4 will be shipping their EK003 system, a 3U system with two X-Gene CPUs and two Tesla K20s.


The Tesla cards of course need no introduction, and meanwhile the X-Gene is an in-house design from Applied Micro that has 8 ARMv8 cores clocked at 2.4GHz. We have previously looked at the X-Gene design a couple of years back, and while they didn’t end up being the first shipping ARMv8 design (Apple’s Cyclone beat them), they are the first ARMv8 design shipping with the appropriate PCIe support to be paired up with Tesla cards. At the time Applied Micro was shooting for a fairly aggressive performance level, but as of right now we don’t know how the X-Gene compares to other ARMv8 designs such as Cyclone, Cortex-A57, and NVIDIA’s own Denver.

Finally, being released in conjunction with these platforms will be the CUDA 6.5 toolkit, which will be introducing ARM64 support on the CUDA side. NVIDIA has not announced a release date for CUDA 6.5, and at this point it’s safe to assume it’s a development release alongside these ARM64 development platforms.

The TrustZone based Trusted Execution Environment is a success story for ARM partners and enables system wide security that can be used to protect the platform and services from software attack.  Historically Trusted OS code has not been available as open source so the OP-TEE OSS project is an interesting development.   Linaro's Security Working Group has been involved and provided some introductory notes here:


They include background info and FAQ and links to the source code on GitHub.  It is currently designed for ARMv7-A with a plan from Linaro to port it to ARMv8-A and alignment with ARM Trusted Firmware.

Some key points:

  • It’s a GlobalPlatform based Trusted OS
  • Currently ARMv7-A (plans for ARMv8-A)
  • Limited hardware support (will need porting)
  • Limited documentation (will be added)
  • Not a key provisioning system for post provisioned Trusted Apps


Let me know what you think of it.

There has been lots of interest in ARM Trusted Firmware.  We are aware that YouTube is unavailable to some parts of the world so you might like these links that should work anywhere:

Go to the session link here: https://lca-14.zerista.com/event/member/102447 you can see links to the presentation slides, video on youtube and video accessible in China:



For the ARM Trusted Firmware: http://lcu-13.zerista.com/event/member/85121 and



Please enjoy the videos and then download the latest release from the Github...


ARM-software/arm-trusted-firmware · GitHub

The ARM Trusted Firmware team have just released v0.4 under a permissive BSD license to enable the ARM ecosystem with a high quality reference implementation of:

1. Secure Monitor Calls (SMC) Calling Convention

2. Power State Coordination Interface (PSCI)

3. Trusted Boot


The code is secure world low level firmware that can be adopted by silicon partners and software vendors as a common foundation.  It was developed for 64-bit ARMv8A but will be useful for ARMv7A implementations too.  You can download it from Github here:



Some of the new features are:

  • Supports secure interrupts targeting the Secure-EL1 Payload
  • Optionally supports making the BL31 entrypoint)
  • Allows platforms with alternative image loading architecture to re-use BL3
  • Specified and future-proof interface to BL31
  • Isolation of secure memory through TrustZone
  • Initializes secure world (e.g. exception vectors, control registers, GIC and interrupts), before transitioning into normal world at EL2
  • Handles Secure Monitor Calls conforming to the SMCCC using EL3 runtime services framework
  • Handles PSCI SMCs for CPU hotplugidle
  • A Test Secure-EL1 Payload and Dispatcher demonstrates Secure Monitor functionality such as world switching, EL1 context management and interrupt routing

Please download and provide feedback to the ARM Trusted  Firmware team.


Hello Everyone,


I was very much interested in getting myself accredited with ARM AAE.

This was going on in my mind for past 1 and half years.

The work was pretty hectic at my end, and i could hardly spare any time for the preparation.

One day, i decided that if i really want to see myself as an AAE i need to have the right motivation and commitment.

Without thinking about anything else, i decided to book the date for the exam.

I took the date and i had approximately 45 days for the preparation.

I started with Cortex A series Programming guide (DEN0013D_cortex_a_series_PG) and  the ARM study guide (AEG0060A_aae_study_guide).

I covered each chapter in accordance with the study guide. It was a very calculative step.

For doubts, which i had during my study period, i used to refer to ARM v7 architecture document and also the ARM community website.

Things like memory barriers and memory types were clearly discussed in the forums.

There was another topic about Instruction Timing Cycles which i covered from ARM 7TDMI document


When last 15 days were left, i for the first time saw the sample papers which are there on ARM website.

I attempted all the papers and my accuracy was around 65 percent.

Just a week before the exam due to some personal work, i had to postpone the exam by 2 weeks.

So i had 2 extra weeks. In these two weeks i revised the Programming guide, went through other's experience on AAE, and consolidated my knowledge by discussing doubts

with my colleagues.

On the D-Day, I reached the test centre on time. Gave the exam with a fresh mind.

After the exam was over, i was confident that i would get the accredition.

Then after submitting my answers,  I waited for the system to give me the GOOD NEWS.

Finally on the screen it came

"CONGRATULATIONS! A preliminary analysis of your reponses indicated that you have answered wnough questions correctly to achive"ARM Accredited Engineer" status.


To sum it up

To get the AAE

Consider "Cortex A series Programming guide" your bible and go through  and understand each and every important concepts within this book.


As the song goes

This is ten percent luck, twenty percent skill

Fifteen percent concentrated  power of will

Five percent pleasure, fifty percent pain

And a hundred  percent reason to remember the name.


All the best to everyone.

I am looking forward for the  ARM Accredited Cortex-A Engineer (AACAE).

Time to mingle with the thousands of visitors attending day 2 of Computex 2014 in Taipei. Simon Segars took to the stage again today (after presenting at the Cavium ThunderX press announcement yesterday - see yesterday's blog Computex 2014: Day 1 with new ARM CPU design center news, Winwyn server technology, Qualcomm Technologies Inc. Snapdragon for more details) to present as part of the Computex CEO Summit. Simon’s “Enabling Disruptive Innovation – The Choice is Yours” presentation was on the theme of new visions of cloud computing technology & the Internet of Things. Simon talked about how it is now possible to get a product to market much faster than ever before, and more successfully, through making use of the disruptive innovation that is occurring all around us, including the use of crowd funding to market test products and the use of cloud services on a type of 'pay as you use' business model. Other presenters at the event included Acer chairman Stan Shih and MediaTek chairman MK Tsai.


Simon Keynote.png


I met with some of the Atmel team to take a look at some of the demonstrations they had at the event.

First lets take a look the Atmel QTouch demo, based on the ARM Cortex-M0+ processor.


I also learnt more about Atmel mobile sensors, based on the Cortex-M0+ processor, from Dr John Logan.



Eddy Huang from Holux Technology took some time to describe two wearable products that are available, both based on the ARM Cortex-M0 processor. The first is an activity tracker that can track your activity for up to six months using a standard CR2032 battery, transferring the activity data to either Android and iOS applications. This waterproof device also offers sleep analysis and several sensor implementations.  Eddy also showed me the waterproof watch format lifestyle and training monitor device that can also monitor your heart rate. This is also based on the ARM Cortex-M0 processor, but this times uses a rechargeable battery.



Over in the Freescale demo room I met with Rajeev Kumar, Director Worldwide Marketing & Business Development, Microcontrollers, to find out more about the use of Freescale ARM Cortex-M0+ and Cortex-A series processors in the latest smart health and wearable products on the market.


Computex is the largest ICT trade show in Asia, and the second largest in the world. 2014 is the 34th year that this event has been running and there will be approximately 130,000 visitors to show, of which 38,000 will be international visitors. There are 1700 exhibitors spread out across several exhibition halls, in two main areas. The first of which is in the area of In addition to the exhibition halls many companies also take over areas of the major local hotels for showing demos and setting up the many customer meetings that will take place over the week.


The show officially started on the Tuesday of this week, and ARM held a press conference on the Monday just prior to event getting starting. ARM announced at the press conference, that was also attended by San-Cheng Chang, Minister of Ministry of Science and Technology (MOST) and Tyzz-Jiun Duh, Deputy Minister of Ministry of Economic Affairs in Taiwan (MOEA), the establishment of a new CPU design centre here in Taiwan, the fourth worldwide, to design the next generation of CPU cores in the Cortex-M series and to further enable ARM success in the Wearables, Internet Of Things (IoT) and embedded applications markets.


I spoke with Noel Hurley about Wearables and IoT and what the opening of the CPU design centre will mean for ARM.



The full video of the Press Conference presented by Ian Drew and Noel is available here.



As I said earlier, the event got underway in earnest on the Tuesday and I met with Steven Lu, Senior Director Product Management Division at Wiwynn where I found out more details about the SV118 product family based on the ARMv7-A based Marvell Armada XP MV78460 processor.



Over at the W Hotel, I met with Leon Farasati, Staff Product Manager Snapdragon Dev Platforms at Qualcomm Technologies, Inc. Leon gave me a great overview of some of the new products and markets that are being enabled by the latest Snapdragon technology, including a glasses free 4K 3D television. Leon also introduced me to the DragonBoard platform which is a development kit based on the Snapdragon processor, which enables you to ' take advantage of pins, connectors, adapters and expansion headers to tap into its functionality in your own product development efforts' [Source DragonBoard Snapdragon S4 Plus APQ8060A Mobile Development Board ]



Cavium held a press conference on the Tuesday where they announced the ThunderX Data Center & Cloud Processors. Simon Segars, CEO ARM, opened the presentation with a talk on how the changes in the way the use of Cloud technology has enabled lots of new ARMv8-A 64-bit technology to be introduced into the server space. Cavium described four classes of product, each with up to 48 ARMv8-A 64-bit processors, with up to 2.5 GHz core frequency, optimized with difference accelerators for specific workloads, a modern day example of the division of labour, which include compute, storage, network and security applications. More details are available at Cavium ThunderX™ ARM Processors


Simon presenting at Cavium Launch.png Cavium ThunderX board.png

This post is originally from: liwenhaosuper.com, the link is here: TEE and ARM TrustZone | System Research Blog. The report is based on the experience when implementing T6: a trusted kernel based on ARM TrustZone.



In this article, I will give an introduction of TEE (trusted execution environment) and ARM TrustZone based on my one and a half year experimentation on several ARM platforms when implementingT6.

What is TEE ?

To begin with, let’s first identify the slight difference between the word Trusted andTrustworthy. Trusted means someone or something you rely upon to not compromise your security while Trustworthy means someone or something will not compromise your security. Or in other words you could treat Trusted as how you use something while Trustworthy is about whether it is safe to use something. So Trusted Execution Environments are what you may choose to rely upon to execute sensitive tasks and of course hopefully they are trustworthy too!  General speaking, there are five security properties that TEE may want to achieve:

  • Isolated Execution
  • Secure Storage
  • Remote Attestation
  • Secure Provisioning
  • Trusted Path

What kinds of TEEs are now available ?Nowadays, there are several TEE platforms available for both research community and industry, including:

  • TPM (Trusted Platform Module). TPM is a dedicated microprocessor designed to secure hardware by integrating cryptographic keys into devices and is available in many modern computers. To utilize the secure primitives of TPM, applications usually combine the TPM (hardware) and the TXT (software) to provide a strong isolation. One thing needs to be pointed out is that, TPM is really SLOW, vendors does not have any motivations to keep it faster, they just make sure it works with low cost!
  • Intel’s TXT (Trusted Execution Technology) or AMD’s SVM (Secure Virtual Machine).  To use the TXT, there are several steps: 1. Suspend OS 2. Execute small amount of code (trusted code) on main CPU 3.Restore OS. While these three steps seem to be simple, actually there is no commercial applications using these technology with several reasons: firstly, when the TXT is on, only one CPU is allowed to execute even in a multicore  machine while other cores are suspended; secondly, there is no interrupt and IO operation in TXT and to keep the TCB as small as possible,  no OS libs are available, that means, you need to take huge efforts in order to run rich functionality applications.
  • Hypervisor-based TEE. Virtualization is a straightforward method to implement TEE and there are large number of systems using hypervisor based solution to provide TEE like functionalities.
  • ARM TrustZone. ARM TrustZone is thought to be the most promising technology to implement TEE in mobile devices (or ARM devices).

What is ARM TrustZone ?

ARM TrustZone technology aimed at establishing trust in ARM-based platforms. In contrast to TPMs, which were designed as fixed-function devices with a predefined feature set, TrustZone represented a much more flexible approach by leveraging the CPU as a freely programmable trusted platform module. To do that, ARM introduced a special CPU mode called “secure mode” in addition to the regular normal mode, thereby establishing the notions of a “secure world” and a “normal world”. The distinction between both worlds is completely orthogonal to the normal ring protection between user-level and kernel-level code and hidden from the operating system running in the normal world. Furthermore, it is not limited to the CPU but propagated over the system bus to peripheral devices and memory controllers. This way, such an ARM-based platform effectively becomes a kind of split personality. When secure mode is active, the software running on the CPU has a different view on the whole system than software running in non-secure mode. This way, system functions, in particular security functions and cryptographic credentials, can be hidden from the normal world. It goes without saying that this concept is vastly more flexible than TPM chips because the functionality of the secure world is defined by system software instead of being hard-wired.


TrustZone Mode

TrustZone Mode

With TrustZone, the TEE would somehow look like this:

TrustZone TEE

TrustZone TEE

Details of ARM TrustZone

The following figure shows the TrustZone hardware architecture including the SoC and peripherals that are connected with SoC. SoC includes a core processor,Direct Memory Access(DMA), secure RAM, secure boot ROM, Generic Interrupt Controller(GIC), TrustZone Address Space Controller(TZASC), TrustZone Protection Controller, Dynamic Memory Controller(DMC) and DRAM, they communicate with each other through AXI bus. SoC communicates with peripherals by  using the AXI-to-APB bridge.


TrustZone Hardware

TrustZone Hardware

Split-World-based Isolated Execution. A physical core processor with TrustZone support works safely and efficiently in two worlds: normal world(or non-secure world) and secure world. CPU states is banked between two worlds and by default the secure world can access all states of normal world but not vice-versa. Below the two worlds, there is a higher privilege mode called TrustZone monitor mode that is usually used for switching between the two worlds either by executing the Secure Monitor Call(SMC) instruction or external interrupts and is responsible for banking the CPU state.

Memory and Peripheral Protection. TrustZone supports memory partition between two worlds by using TZASC and TZPC. TZASC can partition DRAM into several memory regions, each of which can be configured to be used be in normal world or secure world or a more dynamic and complicated access permission control. By default secure world applications could access normal world memory but not vice-versa. However, by enabling security inversion in TZASC, one normal world memory could also be configured as normal world accesses only. TZPC is mainly used to configure peripherals as secure or non-secure and the world sensitive AXI-to-APB bridge will deny illegal access to the protected peripherals. Besides above, on-SoC static memory like ROM or SRAM also need to be protected. This is done by a SoC peripheral called TrustZone Memory Adapter(TZMA), though no direct software configuration registers are provided by TZMA. Usually internal ROM is set as secure by hardware design and the partition of secure and nonsecure region in SRAM is configured by setting the R0SIZE register in TZPC. Both TZASC and TZPC can be accessed and configured only in secure world. A security access violation may cause an external abort and trap to either monitor mode or current CPU state mode exception vector, depending on the configuration of the interrupt behaviour in monitor mode. Besides the physical memory partition,a TrustZone aware MMU enables both worlds with distinct translation tables and with a tag in TLB to identify the world to avoid flushing TLB when switching world. For Direct Memory Access(DMA), there is a multi-channel system controller called Direct Memory Access Controller(DMAC) that moves data around the physical memory system. DMAC is world sensitive and support concurrent secure and non-secure channels. A normal world DMA tranfer data to or from secure memory will be denied, thus avoid security hole.

Interrupt Isolation. There are two kinds of interrupts: IRQ(normal interrupt request) and FIQ(fast interrupt request). GIC with Trust- Zone support can configure an interrupt as secure or non-secure. Usually IRQ is configured as normal world source and FIQ as secure world source, because IRQ is themost common interrupt source in use in modern operating systems. When executing in secure world, a secure interrupt will be handled by secure world interrupt handler; when a non-secure interrupt occurs during secure world execution, the interrupt will be transfered to monitor mode interrupt handler and the software handler can decide whether drops the interrupt or switches to normal world. The configuration of GIC that is security related can only be configured in secure world, thus preventing illegal modification from normal world. The secure configuration register(SCR), which is in the control coprocessor CP15 and is accessible in secure privileged mode only, could be programmed as trapping external abort(i.e.memory access permission violation), IRQ or FIQ into monitor mode or handling the interrupt locally in the current world.

Could ARM TrustZone be used to implement or replace virtualization ?

As many researchers proposed, ARM TrustZone can be viewed from two angles, as virtualization solution and as mechanism to implement functionality similar to Trusted Platform Modules (TPM). When regarded as virtualization solution, TrustZone is severely lacking: 1. The number of virtual machines is limited to two, one running in the secure world and one running in the non-secure world. 2. trap-and-execute model for emulating devices is not possible because security violation abort is always asynchronous. So to support devices emulation, certain device drivers of the non-secure OS must be modified and thus running OSes like Windows is not possible (which only binaries are available). I dare to say that perceiving TrustZone as a virtualization mechanism is not a good choice! When looking at TrustZone as an alternative for TPMs, the motivation behind this technology become much more clear. In contrast to fixed-function TPMs, TrustZone is a vastly more versatile mechanism with unlimited resources and fast chips.

Could ARM TrustZone be used as TPM directly? Does ARM TrustZone provide secure key storage?

I am afraid not. The problem is the lack of secure storage, as TrustZone specification doesn’t provide any mechanism to implement secure storage. However,  the TrustZone feature: assigning a specific peripheral to secure world access only is the key point, but it is up to the Soc Vendors or the TEE developers to decide what peripheral is used as a secure storage media.


If you have any questions regarding ARM TrustZone or our trusted kernel T6, I'd like to hear from you via email  (liwenhaosuper AT gmail.com) or visiting our website: liwenhaosuper.com


With the first Cortex-A53 based SoCs due to ship in the coming months, and Cortex-A57 based designs to follow early next year, ARM gave us a quick update on performance expectations for both cores. Given the timing of both designs we'll see a combination of solutions built on presently available manufacturing processes (e.g. 28nm) as well as next gen offerings (20/16FF). The graph above gives us an updated look at performance expectations (in web browsing workloads) for both ARM 64-bit cores.

If we compare across the same process nodes (28nm in both cases), the Cortex-A53 should give us nearly a 50% increase in performance compared to ARM's Cortex-A7. The Cortex-A57 should offer roughly the same increase in performance compared to Cortex-A15 as well. Although the -A57 will do so at higher power, power efficiency may be better depending on the workload thanks to the added performance. Thankfully we won't see many A57 designs built on 28nm in mobile (AMD's first Cortex A57 design will be aimed at servers and is built on a 28nm process).

If you combine architectural improvements with a new process node, the gains are substantial. Move to 20nm or 16FF for these designs and the improvement over their 32-bit predecessors easily exceeds 50%.


ARM also provided some Geekbench 3 performance data comparing the Cortex-A57 to -A15, both in 32-bit and 64-bit mode. We already know Geekbench 3 is particularly sensitive to the new instructions that come along with AArch64, but even in 32-bit mode there's still a 15 - 30% increase in performance over the Cortex-A15 at the same clocks.


Qualcomm has already announced its Snapdragon 410, 610 and 615 will use ARM's Cortex-A53, while its 808 and 810 will use a combination of Cortex-A53s and -A57s.


Source: Anandtech

Function call basics

Typically when teaching a class about embedded C programming, one of the early questions we ask is "Where does the memory come from for function arguments?"

Take, for example, the following simple C function:

void test_function(int a, int b, int c, int d);


when we invoke the function, where are the function arguments stored?

int main(void)







Unsurprisingly, the most common answer after "I don't know" is "the stack"; and of course if you were compiling for x86 this would be true. This can be seen from the following x86 assembler for main setting up the call to test_function:


  subl $16, %esp

  movl $4, 12(%esp)

  movl $3, 8(%esp)

  movl $2, 4(%esp)

  movl $1, (%esp)

  call _test_function



The stack is decremented by 16-bytes, then the four int's are moved onto the stack prior to the call to test_function.


In addition to the function arguments being pushed, the call will also push the return address (i.e. the program counter of the next  instruction after the call) and, what in x86 terms, is often referred to as the saved frame pointer on to the stack. The frame pointer is used to reference local variables further stored on the stack.


This stack frame format is quite widely understood and historically been the target of malicious buffer overflows attacks by modifying the return address.


But, of course, we're not here to discuss x86, it's the ARM architecture we're interested in.



The ARM is a RISC architecture, whereas the x86 is CISC. Since 2003 ARM have published a document detailing how separately compiled and linked code units work together. Over the years it has gone through a couple of name changes, but is now officially referred to as the "Procedure Call Standard for the ARM Architecture" or the AAPCS.


If we recompile main.c for ARM:

> armcc -S main.c


we get the following:


        MOV      r3,#4

        MOV      r2,#3

        MOV      r1,#2

        MOV      r0,#1

        BL       test_function



Here we can see that the four arguments have been placed in register r0-r3. This is followed by the "Relative branch with link" instruction. So how much stack has been used for this call? The short answer is none, as BL instruction moves the return address into the Link Register (lr/r14) rather than pushing it on to the stack, as per the x86 model.

Note: Around a function call there will be other stack operations but that's not the focus of this post


The Register Set

I'd imagine most readers are familiar with the ARM register set, but just to review;

  • There are 16 data/core registers r0-r15
  • Of these 16, three are special purpose registers
    • Register r13 acts as the stack pointer (sp)
    • Register r14 acts as the link register (lr)
    • Register r15 acts as the program counter (pc)



Basic Model

So the base function call model is that if there are four or fewer 32-bit parameters, r0 through r3 are used to pass the arguments and the call return address is stored in the link register.

If we add a fifth parameter, as in:

void test_function2(int a, int b, int c, int d, int e);

int main(void)







We get the following:


        MOV      r0,#5

        MOV      r3,#4

        MOV      r2,#3

        STR      r0,[sp,#0]

        MOV      r1,#2

        MOV      r0,#1

        BL       test_function2



Here, the fifth argument (5) is being stored on the stack prior to the call.


Return values

Given the following code:

int test_function(int a, int b, int c, int d);


int val;


int main(void)



  val = test_function(1,2,3,4);




By analyzing the assembler we can see the return value is place in r0


        MOV      r3,#4

        MOV      r2,#3

        MOV      r1,#2

        MOV      r0,#1

        BL       test_function

        LDR      r1,|L0.40|  ; load address of extern val into r1

        STR      r0,[r1,#0]  ; store function return value in val



C99 long long Arguments

The AAPCS defines the size and alignment of the C base types. The C99 long long is 8 bytes in size and alignment. So how does this change our model?


long long test_ll(long long a, long long b);


long long ll_val;

extern long long ll_p1;

extern long long ll_p2;


int main(void)



  ll_val = test_ll(ll_p1, ll_p2);





We get:


        LDR      r0,|L0.40|

        LDR      r1,|L0.44|

        LDRD     r2,r3,[r0,#0]

        LDRD     r0,r1,[r1,#0]

        BL       test_ll

        LDR      r2,|L0.48|

        STRD     r0,r1,[r2,#0]



        DCD      ll_p2


        DCD      ll_p1



This code demonstrates that an 64-bit long long uses two registers (r0-r1 for the first parameter and r2-r3 for the second). In addition, the 64-bit return value has come back in r0-r1.




As with the long long, a double type (based on the IEEE 754 standard) is also 8-bytes in size and alignment on ARM. However the code generated will depend on the actual core. For example, given the code:

double test_dbl(double a, double b);


double dval;

extern double dbl_p1;

extern double dbl_p2;


int main(void)



  dval = test_dbl(dbl_p1, dbl_p2);





When compiled for a Cortex-M3 (armcc --cpu=Cortex-M3 --c99 -S main.c) the output is almost identical to the long long example:


        LDR      r0,|L0.28|

        LDR      r1,|L0.32|

        LDRD     r2,r3,[r0,#0]

        LDRD     r0,r1,[r1,#0]

        BL       test_dbl

        LDR      r2,|L0.36|

        STRD     r0,r1,[r2,#0]



        DCD      dbl_p2


        DCD      dbl_p1


However, if we recompile this for a Cortex-A9 (armcc --cpu=Cortex-A9 --c99 -S main.c), note we get quite different generated instructions:


        LDR      r0,|L0.40|

        VLDR     d1,[r0,#0]

        LDR      r0,|L0.44|

        VLDR     d0,[r0,#0]

        BL       test_dbl

        LDR      r0,|L0.48|

        VSTR     d0,[r0,#0]



        DCD      dbl_p2


        DCD      dbl_p1


The VLDR and VSTR instructions are generated as the Cortex-A9 has Vector Floating Point (VFP) technology.



Mixing 32-bit and 64-bit parameters

Assuming we change our function to accept a mixture of 32-bit and 64-bit parameters, e.g.

void test_iil(int a, int b, long long c);


extern long long ll_p1;


int main(void)



  test_iil(1, 2, ll_p1);




As expected we get; a in r0, b in r1 and ll_p1 in r2-r3.


        LDR      r0,|L0.32|

        MOV      r1,#2

        LDRD     r2,r3,[r0,#0]

        MOV      r0,#1

        BL       test_iil



        DCD      ll_p1


However, if we subtly change the order to:

void test_iil(int a, long long c, int b);

extern long long ll_p1;


int main(void)







We get a different result; a is in r0, c is in r2-r3, but now b is stored on the stack.


        MOV      r0,#2

        STR      r0,[sp,#0] ; store parameter b on the stack

        LDR      r0,|L0.36|

        LDRD     r2,r3,[r0,#0]

        MOV      r0,#1

        BL       test_ili



        DCD      ll_p1


So why doesn't parameter 'c' use r1-r2? because the AAPCS states:

"A double-word sized type is passed in two consecutive registers (e.g., r0 and r1, or r2 and r3). The content of the registers is as if the value had been loaded from memory representation with a single LDM instruction"


As the complier is not allowed to rearrange parameter ordering, then unfortunately the parameter 'b' has to come in order after 'c' and therefore cannot use the unused register r1.



For any C++ programmers out there, it is important to realize that for class member functions the implicit 'this' argument is passed as the 32-bit value in r0. So, hopefully you can see the implications if targeting ARM of:


class Ex



  void mf(long long d, int i);




class Ex



  void mf(int i, long long d);




Even though keeping arguments in registers may be seen as "marginal gains", for large code bases, I have seen first-hand significant performance and power improvements simply by rearranging the parameter ordering.


Is is also useful to know that both the ARM Accredited Engineer (AAE) Accreditation and the ARM Accredited MCU Engineer (AAME) Accreditation exams require AAPCS knowledge.


And finally...

I'll leave you with one more bit of code to puzzle over, given:

typedef struct


  int a;

  int b;

  int c;

  int d;

} Example;


void test_struct(Example p);


Example ex = {1,2,3,4};


int main(void)







Can you guess how 'ex' is passed?

Before we do a dive into the ACPI standard, let’s go back to what the main goal is for firmware. Utopia is to have a universal firmware solution which can boot and support any Operating System (open, propriety and future) and any version of that Operating System. The firmware solutions are based on common standards which allow for multiple implementations. This is achieved since each firmware implementation is written and conforms to a set of open standards. The standards allow both manufacturers and consumers to have an advantage.


  • Manufacturers: can choose an Operating System depending upon customer requirements or trends where the firmware has already been implemented and tested.
  • Consumers: universal firmware solution future proofs the hardware by allowing newer Operating System releases to run on older hardware or conversely allowing older Operating System releases to run on newer hardware.


To achieve this advantage an abstraction layer has to be defined which hides the control-details of a hardware platform.


Note: for companies not requiring universal boot firmware there are other directions which can be taken. There are different technologies to solve different problems and each software technology has advantages and disadvantages. In the boot world there are many choices.


If you agree with the concept of universal firmware, I thought it would be interesting to discuss the ACPI standard in the context of “boot and support everything”. As a background, please take a look at the blog I co-authored with Dong Wei (HP Fellow):

“Why should you care about ACPI definition merging into the UEFI Forum?”


In that blog we discussed the industry announcement of the ACPI governance change and the organizational movement of ACPI underneath the UEFI Forum. This announcement was a big event since it allows ACPI to be adopted and influenced by a broader set of companies throughout the industry. Any member company can put forward changes and improvements to the standard. The broader adoption means ACPI is applicable to a wider set of applications and can move quickly with peer review.

Why ACPI? There are other alternatives.


When we started exploring the software requirements for ARM Servers we received a common request to standardize firmware and more precisely standardize on UEFI & ACPI. We tend to take a neutral stance with software and follow commercial guidance. We received a solid universal request from both Server manufacturers and Operating Systems companies that UEFI & ACPI was a preferred option.


ACPI Specification Working Group (ASWG)


The UEFI Forum has a number of working groups including groups looking at Test and UEFI Specification; see the following link for more details.



Each group has a charter, for the ASWG the charter is as follows:


The group’s scope is to manage and evolve the definition of the “Advanced Configuration and Power Interface” specification (ACPI Spec). The purpose of the specification is to define flexible and extensible interfaces for system configuration, power management and RAS (Reliability, Availability, and Supportability) features useful for systems across all market segments from embedded and ultra-mobile devices to the enterprise servers. ACPI normally includes static tables for platforms to communicate system information to the OS during early boot and a name space with control methods as primary runtime interfaces between platform firmware and operating systems software.


In summary the ASWG charter is to correct any deficiencies and improve the ACPI Specification moving forward.


What is driving ACPI on ARM?


As mentioned above, ACPI is being driven by the industry because of the intersection of ARM Servers and ARM 64-bit Processors. Servers require three fundamental pieces of software. Namely standard firmware, standard power management and a consistent Operating System stack. In the above context, ACPI covers the mechanisms for both Device Discovery and Power Management.

Let’s explore ACPI itself ACPI provides standardized mechanisms for Device Discovery, Operating System Power Management, Thermal Management and RAS (Reliability, Availability and Serviceability) communication - just to name but a few. And the specification is comprehensive and Operating System agnostic which is important for universal firmware goal.

What does this mean?


It means standard enterprise server software can be created to run on complex ARM-based platforms without requiring major re-engineering. For example, given an ARM processor based SoC with standard UEFI boot firmware and ACPI, a manufacturer has the opportunity to choose the Operating System provided that Operating System supports the handling of ACPI. The hardware support is achieved without major re-writes or re-builds of the software stack. This is important for enterprise solutions and is different from the traditional embedded and mobile markets. Embedded and mobile can afford to be unique due to the nature of the problems being solved. In contrast, Server software is all about providing an open application platform that sits on top of standard compliant firmware.

Enterprise Server Operating Systems can take advantage of the separation to boot-many using only one image instance. This is achieved since both hardware-discovery and hardware-control is separated by the ACPI abstraction layer. Operating Systems can reduce the test-scenario permutations since only one image is required to be tested, released and supported. There are engineering savings since the AML code embedded within the ACPI tables removes the need for the kernel image to contain drivers for everything. ACPI as a standard allows for stability. This means a 5-year-old Operating System can run on new hardware. This extends the longevity and in turn improves the return-on-investment (ROI).



  • Universal firmware goal is simply to “boot and support everything” - open, propriety and future
  • There are many firmware solutions available since there are always unique requirements.
  • The Server industry (Manufacturers and Operating Systems) have driven the firmware focus to UEFI & ACPI.
  • ACPI is a standard and not an implementation.
  • ACPI standard & specification is now part of the UEFI Forum – an open industry standards body.
  • ACPI standard is Operating System agnostic.
  • ACPI provides standardized mechanisms for Device Discovery, Operating System Power Management, Thermal Management, RAS features, etc.
  • ACPI standard is part of the Server industry firmware goal of “booting and supporting everything”.

Filter Blog

By date:
By tag: