1 2 3 Previous Next

ARM Processors

249 posts

In the recent ARM Connected Community event Interview and Question Time with Joseph Yiu

community member Gopal Amlekar asked the following question:


"How are ARM processors and especially the Cortex-M processors helping in making the IoT more secure, reliable and not prone to hacking?

Is it something to do with the TrustZone?

Even with all these, what care should be taken by developers to make their device more secure in the WWW of things?"


I recently recorded this interview and members should expect to see it very soon! However I would like to elaborate further on this question, and explain in detail about how Cortex-M is approaching security.


Security management on existing Cortex-M processors

In a large part of the microcontroller application space, the most likely security issue is with software. For example, there could be vulnerabilities in the application code or at the communication protocol stack.

Typically, some form of security management can be implemented using the privileged and unprivileged execution levels. By executing protocol stack and application code at unprivileged level, and by using the Memory Protection Unit (MPU), we can significantly reduce the risk of any hacking instance or efforts gaining full control of the device. The MPU can ensure that the stack and critical data used by the OS kernel are not corrupted by a rogue application task. It can also make the SRAM region non-executable so that even if malicious code is injected into the SRAM (e.g. if part of the SRAM can be used to store received packets), such code cannot be executed.

In the mbedOS, which will be available in Q4 2015 the µVisor in the OS also uses the MPU for their security management. On top of that, mbedOS has added a lot of other security features to enable software developers to create applications that need to securely communicate with other devices and server. For example, Datagram Transport Layer Security (DTLS) can be used to securely handle data communications.



Software components in the mbedOS


Can Trustzone for Cortex-A be used for Cortex-M

The software execution environments for Cortex-M processors are often quite different from Cortex-A processors. In the Cortex-A processors, the OS environment (e.g. Android, iOS) allows you to download applications from third parties, meaning you have multiple secure domains within the system. The secure contents need to be completely hidden away from these applications, making TrustZone technology the best way to manage security.

For microcontroller type applications based on the Cortex-M processors, however, software components are often compiled and linked together during the software development stage. As the software components are essentially "trusted", there is no need to hide contents from them. Given that the MPU can prevent hackers from injecting code and executing them, the risk is more about how the on-chip software handles secure contents, and whether it is possible for the software to leak secure contents accidentally.


Multi-core approach

In complex SoC designs, Cortex-M processors might be used for various subsystems (e.g. I/O subsystem, power management). In these systems, 3rd parties software components could be downloaded into the SRAM in the Cortex-M subsystems and executed from there. In these cases, additional security arrangements might be needed. For example, a number of SoC designs use multiple Cortex-M processors in the design, with at least one of them always in a secure domain, and with the others in a non-secure domain. This arrangement can work well with a TrustZone based (e.g. Cortex-A processor) system.


What next

We are continuously investigating future technology to see how we can provide better solutions for a wider range of applications.

In addition to the processors, the mbedOS will be an important part of the picture. The mbedOS will make it easier to develop secure IoT applications because the OS is designed with security management from ground up. A wide range of secure communication technologies will be integrated into the OS so that application developers can deploy these technologies easily, securely and efficiently. The mbedOS will be free to use, and the applications created can be exported into other toolchains for further modifications and optimizations if required.

If you have further questions about Cortex-M security then do not hesitate to comment below and I will get back to you as soon as possible.

I was delighted to see the announcement of the ODROID-C1 by Hardkernel last week!


The Odroid-C1 is the latest addition to a growing number of ARM-based Single-Board Computers (SBCs). At just $35, the ODROID-C1 represents one of the lowest cost SBCs on the market, but also comes with a high-performance specification. The board uses the Amlogic S805 SoC with four ARM Cortex-A5 CPUs, each capable of clocking up to 1.5GHz (translating to over 2300 DMIPS per CPU). Alongside the ARM CPUs are two ARM Mali-450 MP GPUs, each capable of clocking up to 600 MHz and which fully support the OPENGL ES 1.1/2.0 This certainly makes ODROID-C1 one of the most cost effective SBC that provides maximum compute power per dollar spent. The ODROID-C1 also supports 1GB of DDR3, has a MicroSD slot to support an 8GB or 16 GB UHS-1 card, and is capable of running the Ubuntu 14.04 or Android KitKat operating systems. It also packs several other features that can be found on ODROID-C1’s webpage and the December 2014 issue of ODROID magazine. The block diagram below gives more details of the board’s key components.



Source: hardkernel.com

Powering the ODROID-C1 is the Cortex-A5 processor, one of ARM’s most power-efficient and proven ARMv7-A processors. It has shipped in millions of smartphones and other devices since first being introduced to market in 2011. The Cortex-A5 enabled the entry-level smartphone revolution, bringing a high-end mobile experience into low-cost smartphone devices. With the Cortex-A5 now powering the ODROID-C1, it is starting a new trend of powerful, cost-effective single-board computing.


The ARM Mali-450 MP GPU has experienced tremendous success since its launch in 2012. Millions of smartphones, tablets and set-top-boxes are powered by Mali-450 MP, which has been designed for volume markets and optimized with a focus on energy and bandwidth savings. Now, the Mali-450 brings full OPENGL ES 1.1/2.0  support for enabling 2D/3D graphics applications in ODROID-C1.


There are a number of application use-cases, ranging from professional software engineering, through to modern computers built for work or gaming. An interesting application would be a low-cost but performance packed IoT gateway using this SBC. I am curious to see what innovative uses the DIY community will find for the raw compute power provided by this tiny but powerful SBC. With affordable SBCs now becoming increasingly powerful and feature-packed, it will be exciting to see some of the ideas developers are able to conceive in the near future.  I cannot wait to get an ODROID-C1 for my projects!


How about you?

A good paper describes ARM powered Server chips, you can refer to the link below. 

AnandTech | ARM Challenging Intel in the Server Market: An Overview

I was recently asked about the shift and extend operations in the A64 instruction set, and I realised that the ARMv8 ARM doesn't have a simple description of what each operation does. The ARMv8 ARM is very precise if you have the time to read and understand the pseudo-code descriptions for each instruction, but there is no quick reference.


This post tries to be just that: a quick reference for the shift and extend modifiers. I want to describe the various options so that when you see them, you know what they mean. I'm going to restrict this post to the operand modifiers. These modifiers are A64's equivalent of the flexible operand – often called Operand 2 – from ARM and Thumb.


Many of the operations are also available as standalone instructions, and there are several bitfield manipulation operations that are only available as standalone instructions. I won't describe these here, partly because the ARMv8 ARM already describes what they do, but also because it would make this article rather long. I might cover the standalone operations as a follow-up if there is enough demand.


General Form


The general form of these modifiers is quite simple: <operation> {#imm}


  • <operation> is one of the operations described in this article.
  • #imm is usually optional, and defaults to 0.


The modifier affects the register or immediate value that appears immediately before it in the instruction mnemonic. Here are a few examples:


// Subtract a shifted register.
sub x0, x1, x2, LSR #8          // x0 = x1 - ((uint64_t)x2 >> 8)

// Add a shifted immediate.
add x5, x6, #10, LSL #12        // x5 = x6 + (10 << 12)

// Load from an array using a signed index.
ldr w10, [x11, w12, SXTW]      // w10 = *(uint32_t*)(x11 + (int32_t)w12)


Note that not every modifier is available in every context. I won't try to explain what's available where; the ARMv8 ARM's instruction descriptions are quite clear in this regard.


Shift Operations


Shifts take a source register and shift it left or right by the specified number of bits, with optional sign extension. The shift operations are mostly the same as they are in 32-bit ARM and Thumb so if you're familiar with those, the A64 versions shouldn't be surprising.


The shift amount is encoded in the instruction (and is therefore constant). A significant difference from ARM is that there are no register-shifted-by-register forms. Such operations are possible in A64, but as in Thumb, they are standalone instructions with slightly different syntax.


For all shift modifiers, the size of the result is the same as the size of the source; there is no implicit widening or narrowing1.


LSL: Logical Shift Left




Shift bits left by the amount specified, and fill the new bits with zeroes.


  • Bits shifted out of the left are discarded.
  • New bits shifted into the right are set to 0.


This is equivalent to multiplication by 2n, where 'n' is the shift amount. This works for both signed and unsigned inputs.


LSR: Logical Shift Right




Shift bits right by the amount specified, and fill the new bits with zeroes.


  • Bits shifted out of the right are discarded.
  • New bits shifted into the left are set to 0.


This is equivalent to division by 2n, where 'n' is the shift amount. The result is rounded towards zero, like the udiv instruction, and like unsigned integer division in most languages (including C).


ASR: Arithmetic Shift Right




Shift bits right by the amount specified, and sign-extend to fill the new bits.


  • Bits shifted out of the right are discarded.
  • New bits shifted into the left are set to the same value as the source value's leftmost bit. (This is the two's complementsign bit.)
    • If the source value is positive, the leftmost bit is 0, so ASR and LSR are equivalent for positive inputs.


This is similar to signed division by 2n, where 'n' is the shift amount. However, with signed division, some care is needed to handle rounding when negative values are involved. For example, C's signed integer division almost always rounds towards zero2, but a naive ASR-based division will round towards minus infinity.


// C-style signed integer division by a power of two (2^n, where n > 0).
// Correct the result (by incrementing it) only if the bits shifted out
// are non-zero and the sign is negative.
tst x0, #((2^n)-1)
ccmp x0, #0, #0, ne
asr x0, x0, #n
cinc x0, x0, lt


ROR: Rotate Right




Rotate bits right.


  • Bits shifted out of the right are shifted in at the left.
  • Conceptually, a rotation isn't a simple shift operation, but the ARM architecture includes it in the set of shifted-register modifiers.


MSL: Masking Shift Left




This is just like LSL, but instead of inserting zeroes into the rightmost bits, it inserts ones.


This somewhat unusual shift form is only supported by one or two NEON instructions for forming immediate arguments, so you probably won't see it often. It is actually used by the 32-bit NEON instructions too, but it isn't given an explicit name and its use is always derived from a single immediate operand.


Extend Operations


The general principle is to take a sequence of consecutive bits from the source register, then sign- or zero-extend it to make it the required size. The result size is implied by the context.


Most of these extend operations exist in 32-bit ARM and Thumb, but A64 makes them more flexible and in many cases allows them to be used as an operand modifier.


Several contexts also allow extend modes to take an additional immediate left shift (like LSL). This shift has a very limited range (of 0-4 bits), and it applies after the extend operation. For example, SXTB #2 means "sign extend from the 8-bit source, then shift it left by two bits."


Because extend operands can take a shift, UXTX, and in some cases UXTW, are functionally identical to LSL for shifts 0-4. These are actually aliases in a few corner cases where extend modes are available but shift modes are not. For details, refer to the instruction descriptions in the ARMv8 ARM. The meaning of the assembly (and disassembly) does not change.


UXTB, UXTH, UXTW, UXTX: Unsigned extract from 8-bit byte, 16-bit halfword, 32-bit word, or 64-bit doubleword.




Extract the least significant byte, halfword, word or doubleword, zero-extend it to the result size, then (optionally) shift left.


  • These operations are the same as the SXT* operations, except that they do zero extension.
  • UXTX by itself has no effect (if it has no shift).
  • If the destination type is a W register, UXTW also has no effect (if it has no shift).


SXTB, SXTH, SXTW, SXTX: Signed extract from byte 8-bit byte, 16-bit halfword, 32-bit word, or 64-bit doubleword.




Extract the least significant byte, halfword, word or doubleword, sign-extend it to the result size, then (optionally) shift left.


  • These operations are the same as the UXT* operations, except that they do sign extension.
  • SXTX by itself has no effect (if it has no shift).
  • If the destination type is a W register, SXTW also has no effect (if it has no shift).



1Some NEON instructions explicitly do a shift-and-widen or shift-and-narrow operation, but the instruction descriptions explain the details so I won't cover them here.


2The situation in C is fairly complicated. Very roughly:

  • In C89, the rounding behaviour of signed divisions involving negative values – even `-a/-b` – is implementation-defined.
  • In C++98, round-towards-zero was recommended, but not required.
  • C99 and C++11 simplify everything matters by requiring round-towards-zero for all integer divisions.

Both ARM Compiler 5 and GCC-4.9.2 follow this convention for all C variants. ARM Compiler 6 is based on Clang, and it defers to the Clang documentation for such matters. The Clang documentation doesn't seem to cover this language detail, but given that Clang tries to be compatible with other compilers (including GCC), and it has to support round-towards zero for C99 and C++11 anyway, I would be surprised to see any behaviour other than round-towards-zero.

eagerly waiting for new OS embed developed by ARM. I am pretty sure it will boost the IoTs development. Full detail can be found at The Internet of Things Gets a New OS - IEEE Spectrum


In the current NAS market, it is downright impossible to talk of ARM and high performance together. The most powerful ARM-based NAS units have been based on Marvell's ARMADA processors. They usually come with dual gigabit network links and typically target the SOHO and low-end SMB market. Intel's offerings have had a virtual monopoly in the other tiers of the market. Synology is set to change all that with their latest offering - the DS2015xs with native 10G capabilities.

Platform Details

The SoC at the heart of the DS2015xs is the AL-514 from Annapurna Labs, an Israeli startup that is still in stealth mode. The company has declined to speak to the media as of now. However, tracing some coverage of Israeli VC firms reveals that Annapurna Labs was founded in 2011 with the intent of bringing ARM-based communication processors to the market. Datasheets of SoCs from Annapurna Labs are not currently available to the public, but Synology was kind enough to divulge the following details (which, I suspect, can be gleaned via SSH access to the DS2015xs):

  • The AL-514 has four ARM Cortex-A15 cores running at 1.7 GHz
  • The Cortex-A15 cores are configured with LPAE (large physical address extension) that allows addressing of more than 4 GB of RAM (the DS2015xs supports up to 8 GB)
  • The SoC has two 10G Ethernet MAC IPs integrated

From the link below you can find more information:

AnandTech | Synology's DS2015xs brings ARM to High-Performance NAS Units

Ben Walshe

What is AMBA?

Posted by Ben Walshe Dec 8, 2014

My background

I had just joined Duolog as a graduate of business through a graduate development program which links graduates with small to medium Irish enterprises. My role there was to expand their marketing reach in the competitive marketplace, but with no technical background I had to get my head around so many terminologies, acronyms and engineering talk to fully understand how the industry and company operated. One name that crops up everywhere is ‘AMBA’, and when I looked up the term, and then asked the engineering team about it, I received the answer: ‘Oh it is just a standard that engineers use, ARM created it back in the 90’s, I don’t think you will have to worry about it too much’ – OK, AMBA is a standard and I don't have to worry about it in my role at Duolog.

My old company, Duolog, is now part of ARM Socrates IP Tooling. To Learn more about Socrates, see Eoin McCann's interview with Dave Murray on 'How to solve the IP integration problem'


After a bit of a wild ride as an employee, here I am, post-acquisition, working for ARM. My role entails ensuring that the marketing channels such as the website are up to date and optimized for the user. One thing I have found difficult is getting to grips with all the products, technologies and jargon in the System IP space for which I am responsible. This is where I was reunited with the standard I was told I wouldn’t have to worry about – AMBA. I felt this small blog might hold some value for those not familiar with AMBA, and I hope it explains it somewhat.


So what is AMBA?

“The ARM AMBA (Advanced Microcontroller Bus Architecture) protocol is an open standard, on-chip interconnect specification for the connection and management of functional blocks in a System-on-Chip (SoC). It facilitates right-first-time development of multi-processor designs with large numbers of controllers and peripherals. AMBA promotes design re-use by defining common interface standards for SoC modules.” (ARM.com)


Or if you prefer:

“Advanced Microcontroller Bus Architecture (AMBA) is an architecture that is widely used in system-on-chip designs, which are found on chip buses. The AMBA specification standard is used for designing high-level embedded microcontrollers. AMBA’s major objective is to provide technology independence and to encourage modular system design. Furthermore, it strongly encourages the development of reusable peripheral devices while minimizing silicon infrastructure.” (techopedia.com)

Or put a lot simpler:

“It’s the interface(s) everyone uses to bolt blocks together in their chip.” (me)



There are also a number of other acronyms associated with AMBA such as AHB or AXI. Below I have listed the seven main interfaces along with my interpretation of their purpose.


NameAcronymMy Interpretation
Advanced System BusASBNow obsolete, so don’t worry about this one!
Advanced Peripheral BusAPBSimple, easy, for your peripherals
Advanced High-Performance BusAHBNow used a lot in Cortex-M designs
Advanced eXtensible InterfaceAXIThe most widespread, now up to AXI4
Advanced Trace BusATBFor moving trace data around the chip, see CoreSight
AXI Coherency ExtensionsACEUsed in big.LITTLE systems for smartphones, tablets, etc.
Coherent Hub InterfaceCHIThe highest performance, used in networks and servers




Registered Trademark for ARM AMBA Interconnect Standards



Since it is a standard, I thought to myself ‘how did that come about?’ The story of AMBA goes all the way back to 1995, when ARM was much smaller and received some EU funding. With this EU support, Advanced Microcontroller Bus Architecture (not the ARM Bus Architecture), which was introduced as an open architecture in 1996 after being developed in house during the previous year. It facilitates development of multiprocessor designs with large numbers of controllers and peripherals. Since its inception, the scope of AMBA has, despite its name, gone far beyond microcontroller devices. Today, AMBA is widely used on a range of ASIC and SoC parts including applications processors which are typically found in modern portable mobile devices like smartphones.


AMBA soon became a registered trademark of ARM. An important aspect of a SoC is not only which components or blocks it houses, but also how they interconnect. AMBA served as a solution for how the blocks would interface with each other. It soon became the 'de facto' standard interface for anyone which to bring a controller or a peripheral IP block to market.

The first version of AMBA included two buses, the Advanced System Bus (ASB) and Advanced Peripheral Bus (APB).


In its second version, AMBA 2, ARM added AMBA High-performance Bus (AHB) which is a single clock-edge protocol. AMBA 2 was widely used on ARM7, ARM9 based designs and still is today on ARM Cortex-M based designs.

In 2003, ARM introduced the 3rd generation, AMBA 3, including Advanced eXtensible Interface (AXI) to reach even higher performance interconnect and the Advanced Trace Bus (ATB) as part of the CoreSight on-chip debug and trace solution.

In 2010 the AMBA 4 specifications were introduced starting with AMBA 4 AXI4, before extending system wide coherency with AMBA 4 ACE in 2011 (This system coherency allows different processor clusters to share memory and enables technology such as ARM's big.LITTLE processing. These are widely used on ARM’s Cortex-A9 and Cortex-A15 processors

In 2013 the AMBA 5 CHI (Coherent Hub Interface) specification was introduced, with a re-designed high-speed transport layer and features designed to reduce congestion. It has been architected for scalability to maintain performance as the number of components and quantity of traffic rises. This includes placing additional requirements on masters to respond to coherent snoop transactions that mean forward progress for particular masters can be more easily guaranteed in a congested system. The separation of the identification mechanism into master identifiers and transaction identifiers allows the interconnect to be constructed in a more efficient manner. AMBA 5 architecture defines the interfaces for connection of fully coherent processors, such as the Cortex-A57 and Cortex-A53


I recently worked on the product release of the new CoreLink Cache Coherent Network family members . The new series of CoreLink Cache Coherent Networks is supported by AMBA 5 CHI protocol. William Orme recently covered 5 things you may not know about AMBA 5 CHI which is an enormously valuable resource for anyone looking to understand the newer AMBA specification


The New CoreLink Cache Coherent Networks

In Conclusion

It is very difficult to summarize nearly 20 years of development of the AMBA architecture, and I hope this short blog provides a simple overview of its function as well as a brief summary of the architecture's history.


To summarize; AMBA is a de-facto standard for on chip communication and its benefits include

  1. Enabling IP re-use.
  2. Flexibility of requirements.
  3. Multilayer architecture.
  4. Compatibility between design teams and vendors.
  5. Industry wide support.


For more information please visit the AMBA specification web-page on ARM.com

Upcoming training courses



CountryCourse datesContact details

AAE training course

December 16th - 19th, 2014

Contact Embedded Systems Solutions for registration details - click here.

AAME training courses

December 17th-19th, 2014

February 25th-27th, 2015

April 15th -17th, 2015


AAE training course

February 11th -13th, 2015

Contact Ac6 for more information to book a training session.

AAME training courses

December 17th-18th, 2014

February 26th-27th, 2015


AAE training course

January 27th-28th, 2015

Courses take place at Yokogawa Digital Computer Corporation headquarters in Tokyo.


The agenda and registration details can be seen at: http://www2.yokogawa-digital.com/product/arm/aatp.html


AAME training course

February 23rd-27th, 2015

AAE training courses

January 19th-23rd, 2015

March 23rd-27th, 2015

Contact Elvira Systems for details - click here.  

AAME training courses

January 12th-15th, 2015

April 27th-30th, 2015

March 09th-12th, 2015

Email Anacom Eletronica at treinamento@anacom.com.br

Get ready for a new challenge.

When I got my first job as embedded developer in 2001, the target was an MCU based on ARM7TDMI. Since that day I never stopped working with ARM processors, writing thousands of lines of assembly code, debugging in caches, TLB, multi-core systems, and recently experiencing secure boot on 64-bit Cortex-A53. For people working with me, I know ARM cores, period.


So why did I take the time to go to the exam center and passed those tests ? Because I love new challenges ! Seriously, every major actors in the digital world delivers certifications: Microsoft, Oracle, VMWare ... it's a win-win deal, for ARM, the community grows, for me, I got a valuable endorsement of my skills.


The D day.

The exam is organized by a company called Prometric, a testing specialist. I appreciated that those guys had a test center in the French Alps in the nearby of my office!


Each test is a simple quiz, 70 questions, for most of them only one correct answer among four proposed answers. When you hesitate, just mark the question and come back later. Before ending the test you can review all questions, or only marked ones. The interface is simple, efficient, and does not generate stress. Well done Prometric.


Congratulations !

The best about those exams, is that you get the result immediately, passed or failed. On the other side, do not expect to see where you failed, there is no review of answers after the test. Of course I am talking about the case you succeeded (like me!), but I guess that it is the same when you unfortunately failed.


To successfully pass the test you really have to know ARM products: ARMv7 architecture (its better when you also know a bit of v6 and even v5), Cortex-A/R/M cores and toolchain.


The ARM website has a lot of useful information on how to prepare the test. I would like to add that you really must be prepared to understand a small sequence of assembler code, to be able to explain or properly use memory barriers, to answer a question about VFP/NEON registers, and some others non-obvious stuffs.


The scoring is fair, 70% of correct answers is required to pass, which means that you can fail the hardest questions ... but not too much!


Next steps.

I understand that ARM will release a portfolio of certifications, and that AAE and AAME are only the entry level ones, that every engineer seriously working with ARM product must pass.


On my side, it is now time to go back to Cortex focused tasks, with the newly gained recognition of my work by ARM.

V5_to_V8_Architecture[1].jpgARMv8-A, the ARMv8 A-profile version of the ARM architecture, was first publically previewed in October 2011. Over the past two years, there have been a growing number of ARMv8-A announcements from ARM, such as its Cortex-A53 and Cortex-A57 products, plus additional cores and end-user devices from licensees and OEMs. Many of these products are in, or entering, volume production today. As reported in the Q3-2014 financial results, ARM has signed 57 ARMv8-A processor and architecture licenses, meaning there are many more ARMv8-A based processors and products under development that will appear over the next 1-2 years.


Architecture evolves with constant requests for additions and refinements. To allow the ARM ecosystem to manage the next stage of its evolution, ARM is introducing a set of small scale enhancements that are fully backwards compatible with the initial v8.0 architecture, and will be collectively known as ARMv8.1-A. These have been developed in conjunction with the ARM partnership and will start to appear in public specifications, software development tools, models and software support throughout 2015, with early adopter silicon expected in the latter part of 2015. More details will emerge from ARM and its partners as products are introduced. It is important to recognize that introduction of these enhancements into new cores will take several years, and other design choices can have a much greater impact on system performance. Some markets and use cases, such as mobile, are expected to see little benefit from these changes. This means that v8.0 will continue to be the architecture of choice for many new designs and most software development over the medium term, and that v8.1 will have a gradual affect across different market segments, starting with very large systems. Many of the changes will be transparent to the user, with operating systems such as Linux using runtime library selection or kernel patches to adapt where necessary.


For a summary of the ARMv8-A architecture, see the section on ARMv8 architectural concepts in Chapter A1 of the ARMv8-A Architecture Reference Manual. This document, ARM DDI 0487, can be downloaded from infocenter.arm.com by following the links from the top level => ARM architecture => reference manuals section.


ARMv8.1 overview


The enhancements introduced with ARMv8.1 fall into two categories:

  • Changes to the instruction set.
  • Changes to the exception model and memory translation.

Instruction set enhancements

ARMv8.1 includes the following additions to the A64 instruction set:

  • A set of AArch64 atomic read-write instructions
  • Additions to the Advanced SIMD instruction set for both AArch32 and AArch64 to enable opportunities for some library optimizations:
    • Signed Saturating Rounding Doubling Multiply Accumulate, Returning High Half
    • Signed Saturating Rounding Doubling Multiply Subtract, Returning High Half
    • The instructions are added in vector and scalar forms.
  • A set of AArch64 load and store instructions that can provide memory access order that is limited to configurable address regions.


As well as the additions, the optional CRC instructions in v8.0 become a requirement in ARMv8.1.


The atomic instructions can be used as an alternative to Load-exclusive/Store-exclusive instructions, by example to ease the implementation of atomic memory updates in very large systems. This could be in a closely coupled cache, sometimes referred to as near atomics, or further out in the memory system as far atomics. The instructions provide atomic update of register content with memory for a range of conditions:

  • Compare and swap of 8-, 16-, 32-, 64- or a pair of 32- or 64-bit registers as a conditional update of a value in memory.
  • ADD, BitClear, ExclusiveOR, BitSet signed and unsigned MAXimum or MINimum value data processing operations on -8, 16-, 32- or 64-bit values in memory. These can occur with or without copying the original value in memory to a register.
  • Swap of an 8-, 16-, 32- or 64-bit value between a register and value in memory.
  • The instructions also include controls associated with influencing the order properties, based on acquire and release semantics.

The limited order (LO) support is in two parts:

  • System registers configure one or more memory LORegions with a minimum resolution of 64Kbytes.
  • LoadLOAcquire and StoreLORelease instructions for 8-, 16-, 32- and 64-bit values are added, and can be used instead of the global ARMv8 LoadAcquire and StoreRelease instructions.



Exception Model and Translation System enhancements


Additions associated with the exception and memory model are:

  • A new Privileged Access Never (PAN) state bit. This bit provides control that prevents privileged access to user data unless explicitly enabled; an additional security mechanism against possible software attacks.
  • An increased VMID range for virtualization; supports a larger number of virtual machines.
  • Optional support for hardware update of the page table access flag, and the standardization of an optional, hardware updated, dirty bit mechanism.
  • The Virtualization Host Extensions (VHE). These enhancements improve the performance of Type 2 hypervisors by reducing the software overhead associated when transitioning between the Host and Guest operating systems. The extensions allow the Host OS to execute at EL2, as opposed to EL1, without substantial modification.
  • A mechanism to free up some translation table bits for operating system use, where the hardware support is not needed by the OS.

Finally, some new events are added to the Performance Monitor Unit (PMU) to better support profiling in operating systems such as the perf utility in Linux.




The ARM architecture, in line with other processor architectures, is evolving with time. ARMv8.1 is the first set of changes that ARM is introducing to the latest version of its ARMv8 A-profile architecture, grouped to help the eco-system manage tools and software support alongside the large numbers of ARMv8-A based processors and products in development or production today. These changes provide incremental benefits over v8.0, and as such, will appear as a gradual migration in cores and related products over several years. It should be noted that other design choices by silicon partners can have a much greater impact than the choice between v8 versus v8.1, and consequently we expect both to co-exist in the market for many years to come.  Public specifications will be supplied to support initial product introductions mid-2015, with some early visibility through tools and software starting now. Partners can currently obtain more details under a confidentiality agreement through their sales and support channels.



David Brash is Architecture Program Director in the Architecture and Technology Group, one of several groups within ARM’s engineering community.

Darren Cepulis

ARM @ SC14 News Recap

Posted by Darren Cepulis Dec 1, 2014

Here's a quick follow up post to my original Blog on SC14 in New Orleans, November 16th - 20th. SC14 is the largest annual Server/HPC/Datacenter convention in the world, filling up 1/3 of the massive New Orleans convention center.  It was my first time attending and the scope of the show really made an impression on me.


Overall, we (ARM) had a great show, winning a couple HPCwire awards, supporting our SoC and hardware partners, and networking with potential HPC ecosystem partners. There is no bigger event for the ARM server marketing and business development segment and I look forward to even bigger plans and ARM participation for SC15 in Austin, TX.


Related announcements and press that went live around SC14...below.


ARM announces collaboration with Pathscale on high-performance C/C++/Fortran compiler and math libraries work in support of ARMv8:

ARM HPC Ecosystem Continues to Build Momentum with Introduction of PathScale EKOPath Compiler - Yahoo Finance

Cray announces collaboration with ARM in support of DOE’s FF2 project:

Cavium announces Cray interest in its ThunderX SOC:

Cavium Enters Supercomputer Arena With Cray Collaboration CAVM - Investors.com


Allinea announces DDT 64-bit ARM support:

Allinea brings parallel debugging to 64-bit ARM platforms | Allinea

NVIDIA mentions ARM for heterogeneous compute:


AppliedMicro  offers HPC evaluation clusters to LLNL with Cirrascale and RedHat:

Applied Micro Collaborates With Red Hat and Cirrascale to Deliver an Evaluation Platform for HPC Clusters to Lawrence Li…


E4 w/ Applied Micro present ARKA platforms:


HPCwire Awards:

Applied Micro/Cirrascale/LLNL:

IT News Online > - Applied Micro, Red Hat, Cirrascale Deliver Evaluation Platform for HPC Clusters to Lawrence Livermore…


Max-density HPC, gaming, cluster architecture with RapidIO and Nvidia SOC:

IDT, Orange Silicon Valley, NVIDIA Accelerate Computing Breakthrough With RapidIO-based Clusters Ideal for Gaming, Analy…

A writepaper 'Overcoming the Size and Power Trade off in Wearable Designs' is posted on AnandTech, it outlines how Freescale and ARM are working together to deliver market-leading semiconductor solutions for wearable devices.

You can read this writepaper from the link below.


ARM's partners are getting great use from Juno as a development platform, helping them get ready for 64-bit capable ARMv8-A based platforms.  You may have seen that ARM has supported this platform with ARM Trusted Firmware (A thin layer of secure 64-bit firmware running at Secure EL3).   ARM TF has proven very popular with partners who want to implement trusted boot and integrate a Trusted OS to create a Trusted Execution Environment.   So its great to see that Juno is now being supported by OP-TEE an open source TEE that Linaro have been working on.  You can find out more details by searching for "Github OP-TEE" where you can find the relevant git commits.

OP-TEE/optee_os · GitHub

I recently had the opportunity to sit down with David Murray and talk about the current state of affairs for IP integration in the context of building systems. For those of you who do not know David, he is an incredibly enthusiastic technologist who previously held the role of CTO at Duolog before gaining the impressive-sounding title of IP Tooling Architect in ARM. An energetic and articulate man, he is always interesting to listen to and I hope you enjoy the interview below. Feel free to ask questions in the comment space below and David will answer them ASAP.

This blog post follows up on the interview I conducted with Norman Walsh a couple of weeks ago. Norman spoke about the history of IP integration and how it has evolved to the point we are at currently. You can read the interview here Interview: A brief history of IP integration

Hi David, what’s going on in the IP integration space?

Well - IP integration continues to be a key challenge in SoC development. We’ve seen consistent increases in IP reuse, IP configurability and system complexity within tightly bound schedules compound the problem of IP integration. The number of IP in  a system continues to grow, the complexity and configurability of that IP itself is growing and the overall integration scope is growing as it affects more and more teams from front-end to backend  e.g. software, RTL design, verification, physical implementation etc.


This is a problem area that we are very familiar with and have been architecting integration solutions over the last number of years.   One of the fundamental pillars of improving the IP integration process is the standardization of the data in the process. This is something that Norman Walsh has mentioned in his interview is that we need to standardize our IP data (particularly the interfaces)through the use of metadata. If an IP can communicate its interfaces in a standard way than that the whole IP and SoC integration processes a lot easier.  If we can have a formal definition (in some metadata format) of all the interfaces of an IP then we can use more automated intelligence about how it should be hooked up and enable other crucial flows.  For example being able to identify AMBA interfaces, clocks, resets, interrupts, DMA, debug and trace interfaces etc.  Also, it’s not just the hardware interfaces I’m talking about, it it’s equally important to have a good view of the hardware/software interfaces like the registers and memory maps within the IP.

So how is this interface information standardized?

Well, for me, the obvious first thing is to make it so that the IP actually uses industry standard protocols as much as possible such as AMBA (ACE, AXI, AHB, APB, etc).  These interfaces are quite configurable so it’s important to be able to define their content and configuration in a metadata format. The main standard that the industry uses is the IP-XACT format, originally developed under the SPIRIT consortium but now developed under Accellera.  This essentially specifies a definition in a machine-readable (XML) format that can describe the IP interfaces and memory maps as well as its contents and connectivity. We are currently working within ARM to increase the standardization of ARM IP so it will be easier to integrate.

As long as a design flow creates this IP-XACT then we can work from there and run queries on that IP-XACT. Because we know what the tool reads and interprets, we can work together with partners to help them define the necessary IP-XACT specs.

IP Standardization.png

Fast IP integration requires standardized IP







That sounds great but how does it work with lots of different 3rd party IP or a partner's internal IP?

ARM also produces IP-XACT standard bus definitions that can be downloaded from the ARM Website for anybody to use.  If other IP providers use these standard definitions then it will provide a much easier mechanism of connecting these IP in a sub-system or top-level.  Also, don’t forget that this is not just enabling more efficient integration – our partners will also benefit from the provision of better EDA solutions that can leverage this metadata.




So there’s a lot work being done in ARM at the moment?

This is something we’ve been working towards for the past 6 or 7 years, even within Duolog, because there is huge potential for reducing bugs and streamlining design and verification processes.  At the moment we’re very focused on increasing the level of standardization within ARM IP and even have an internal IP-XACT modelling definition group. We’re creating new bus definitions, new extensions and guidelines on IP-XACT usage and of course we’re leveraging ARM Socrates to create better IP-XACT flows. Also, from working in the Systems and Software Group at ARM we have a sub-system and SoC-level perspective so we’ve become avid consumers of IP-XACT which gives us a good feel for what our partners are experiencing. The main challenges that we face are probably fairly common in the industry. We’re trying to standardize all of the interfaces that we need in metadata format but firstly we need to understand all of the different stakeholders.


This is great because once we have standardized IP interfaces it makes the integration and verifications process significantly faster. However while you can standardize most interfaces with a relative minimum of fuss, we’re seeing a lot of IP blocks these days that can be tweaked in many different ways.  In some ways we see IP configurability as the biggest integration challenge.


The level of IP configuration that is available these days poses a problem with integration, because you can take the same IP block and configure them in different ways, and they will look and act totally differently based on this. So that makes it more difficult to then integrate successfully into a system.


I liken this to a mixing desk that you would have in a recording studio with hundreds of switches that can be turned one way or another to affect performance. The options enable designers to optimize their IP, but the amount of choice can also be confusing. What the user really wants is to be presented with the best configuration options for that particular IP block that represents the system constraints.





Multiple configuration options can often leave designers confused

So how are you tacking this configuration problem?

When we talk about IP configuration in general, there are three different types of configuration levels that an IP block can have. First off you have what is called ‘static IP’ which cannot be configured at all. This was what you would call ‘off the shelf’ IP that was more common in the past, where you would purchase it for a ‘plug and play’ type functionality. Nowadays even off-the-shelf IP requires a bit of user configuration according to each individual design.


The second type of configurable IP is a simplified version that has a fixed set of parameters that can be set. Having said that - it can be a challenge creating a configurable IP because let’s say for example you have 10 or even 20 parameters, the amount of possibilities makes it difficult to guarantee that your IP will work for every single configuration. Validation teams and modelling will ensure that the IP works fine for the most probable scenarios, but it’s hard to test for everything. You only have a finite amount of verification resources to ensure that it is all tested rigorously.


The third type of configurable IP is heavily dependent on the system for its configuration, an example of these would be system interconnects, debug and trace subsystems,  power,clock & reset, interrupts, I/O, memory systems. They are super configurable and the amount of permutations means you need a different type of strategy to properly handle these. Ideally you would have some form of highly intelligent solutions that can interpret the system requirements and interfaces so that users can easily configure these types of IP.  These are the challenges that we have been working through and steering the Socrates design environment into providing solutions in this area.


ARM are already providing a lot of IP in this area including bus interconnection IP such as ARM CoreLink NIC-400ARM CoreLink CCI-400 Cache Coherent Interconnect, and ARM CoreLink CCN-512 Cache Coherent Network) as well as ARM CoreLink GIC-500 and also CoreSight Debug and Trace IP. These IP will consume vast amounts of system connectivity e.g. a cascaded interconnect infrastructure and CoreSight Debug and Trace could consume upwards of 50% of a systems connectivity, so in some ways highly configurable IPs are one of the pillars to solving the integration solution. 

The new key ingredient that we are bringing to the table is to help manage the configuration of these IP so that it is aligned with its system context. If we can understand the contents of a system and its different interface requirements we can help to guide the configuration of IP.

How do we do this? – By having all system components in a metadata format, of course, and to have intelligent flows that can extract this information and perform this guided configuration – really it’s intelligent IP Configuration.




So this is how the IP integration problem can be solved?

Yes - The vision that we have been working towards with the ARM Socrates IP Tooling for the last number of years has been to create a ‘System in a Day’ by creating an intelligent IP configuration capability.  Back when we first released Socrates, over 6 years ago, the integration task was taking people many months to get an initial RTL netlist and several more months thereafter to get a viable system up and running.  With Socrates we began making significant reductions to that schedule, bringing it down to several weeks.  We saw however that each piece of IP was designed , built and integrated independently of each other. So for example the interconnect was built from a specification, and then people attempted to integrate it into the system from the same (probably outdated) specification.  The bottleneck of the ‘System in a Day’ was the creation and integration of these system-dependent IP. The solution that we centred in on was to seek an intelligent way of configuring these IP … within the context of their system. We are arriving at a solution to the IP integration problem through intelligent configuration of the IP itself. I believe that configuring every aspect of the system correctly is a highly effective way of increasing its overall connectivity. 


What we’re trying to do here is use the metadata to give a fast, correct configuration in a system context. What I mean by system context is that you can see how different system requirements have a knock-on effect on the configuration of each IP and the system as a whole. What that allows us to do is reduce the time that’s spent on actually integrating the parts into a system because 90% of that work will have been done through intelligent configuration. In order to realize our ‘System in a Day’ vision for IP integration we need to do it through intelligent configuration. You need to have a solution for these complex IP blocks so that they can reconfigure themselves as the system is being defined.


We’ve seen partners say that even just understanding the perspective of some of the more complex IP blocks within the system normally takes them several weeks to compile. In the past they have had to go through the TRMs and specs to understand what is required for the system.  We want to be able to provide this information instantly from the metadata of the IP in the system.

System in a Day.png

The IP integration problem will be solved through intelligent configuration


Verification is such a massive part of SoC design these days, how does Socrates fit into that story?

Going back to the IP-XACT metadata that I mentioned earlier, by working in this format we’re able to get a clear picture of the system  very early on and in an easily readable format (XML). We can then hand off this rich information about the system, its interfaces, the registers views and memory maps to our EDA partners and other ecosystem stakeholders.  Because the information is presented in a format that is standard and machine-readable they can work on verifying it immediately. For example Cadence Design Systems can take the metadata for its Interconnect Workbench and automatically creates a verification environment and out-of-box performance scenarios and analysis This feeds into our overarching goal of helping partners design and implement systems in a much shorter timespan.




This sounds very exciting for the future, but is any of that available today?

ARM Socrates is already a proven solution for IP standardization and integration and we’re now beginning to leverage an intelligent IP configuration methodology.  Now that we’re part of ARM we feel that there is a much greater value that we can bring to our partners as we’re working directly with the IP and can steer is standardization and streamline its integration. This is very exciting and you should see new solutions appear in the near future.




Final question for you here. IP Tooling Architect is an interesting title, how has your role changed from being the CTO at Duolog?

Well, when you are in a small company like Duolog – it’s more ‘roles’ than role.   I would have had to keep constantly tuned into our customers design flows and look for any hotspots in their development process. Once the main problems were identified, we had to work on envisioning and architecting a solution to this, and eventually, with a committed team, realizing a high-value product such as Socrates. That’s how we got into IP Integration – we didn’t chase it – it came to us, through our customers.   It was difficult however for a small company to carve out its niche so there was also a lot of evangelizing, writing white papers, blogs, presenting at conferences etc. and plenty of customer meetings.    Another one of the things I did within Duolog was to align ourselves with standards groups and try and progress them toward real solutions areas.  For a small company Duolog invested quite a lot of time and effort in driving the IP-XACT standard and this is definitely something I will continue to do within ARM, helping to progress both internal and industry standards for everyone’s benefit.


Now that I think about it, overall I have a pretty similar role in ARM.  The problem space is still IP integration – we’re still chasing the same dream of ‘System-in-a-day’ and I can boldly speculate that this WILL become  a reality – the big change is that now that we’re ARM,  we've got ARM IP in the equation and wow – that brings incredible potential.  Before, as Duolog, we had to partner with ARM to get limited access to the IP - Now we can work directly with the IP designers from a much earlier stage of development and facilitate intelligent IP integration from the bottom-up with standardised IP blocks – this will be a game-changer. 




OK great. Thanks for your time David

No problem, my pleasure. (Have I really been talking for 10 minutes?) -   I hope your readers find this interesting and ask them to leave a comment if they have any questions for me!





Samsung Electronics, a world leader in advanced semiconductor solutions, has released the latest chip in its flagship Exynos series. The Exynos 7 Octa is an octa-core SoC designed for use in mobile applications such as smartphones and tablets. One of the leading SoCs to be based on the ARMv8-A architecture, the Exynos 7 Octa features a combination of four powerful Cortex-A57 cores and four efficient Cortex-A53 cores. It has been built with twin clusters of four Cortex-A57 and four Cortex-A53 processors to reap all the benefits of big.LITTLE™ processing; a power-optimization technology that delivers higher peak-performance capacity at significantly lower average power. It also utilizes big.LITTLE processing with Samsung's HMP (Heterogeneous Multi-Processing) solution, meaning every process can efficiently use the processing power in such an intelligent way that no matter what the multitasking needs are, or what application is being run, there will be no lags and ultimately no drastic power consumption.

One of the key components that enable the big.LITTLE processing is the ARM CCI-400 Cache Coherent Interconnect which provides full cache coherency between two clusters of multi-core CPUs. The CCI-400 enables faster performance all across the chip through system-wide hardware coherency and virtual memory management. This combination along with the improved feature sets of the Cortex-A57 and Cortex-A53 cores and the ARMv8 instruction set, has contributed to delivering the 57% performance uplift on the previous generation Exynos processor, as the Exynos 7 Octa brings advanced features to everyday mobile computing.


ARM CoreSight debug & trace technology was important to the Exynos 7 Octa’s successful release, as its real-time on-chip visibility was used to identify and eliminate bugs quickly. Using CoreSight minimized the risk of costly bugs, allowing more attention to be focused on maximizing performance on the SoC.


The graphics performance has been enhanced on the Exynos 7 Octa by the ARM Mali T-760 GPU, delivering stunning quality at even greater energy efficiency. This performance increase has paved the way for high-resolution games, face/eye recognition and image/video processing to all become a reality in Samsung’s next-generation devices. Users of a device containing the Exynos 7 Octa can simultaneously record high resolution video or pictures using both the front and rear camera, and even output UHD quality video to their TVs. The first device containing the Exynos 7 Octa to reach the market is the Samsung Galaxy Note 4. It was released in October of this year, and is a very popular smartphone. It is poised to enjoy even more success in the final quarter of 2014, a period with typically high sales volumes of smartphone devices.


Current smartphone users demand a device that enables them to be constantly connected with powerful performance, while expecting its battery to last the entire day. In summary the Exynos 7 Octa takes advantage of new power management features in its ARM IP to improve power consumption, while simultaneously providing superior performance. These updates allow Samsung to stay at the forefront of the market as a total integrated solution provider for SoC designs.


For more information: Samsung Exynos 7 Octa

Filter Blog

By date:
By tag: