Skip navigation

Blog

1 2 3 Previous Next

ARM Processors

405 posts

The main problem

When assembling your first programs with AS, and linking it to external libraries like this :

armv7a-hardfloat-linux-gnueabi-as -o test.o test.S

armv7a-hardfloat-linux-gnueabi-ld.gold --hash-style=sysv -o test test.o -lc

 

You might be confronted to this error when executing your program :

$ ./test

-bash: ./test: No such file or directory

Using strace to pinpoint the problem will result to something like this :

execve("./test", ["./test"], [/* 28 vars */]) = -1 ENOENT (No such file or directory)

write(2, "strace: exec: No such file or di"..., 40strace: exec: No such file or directory

) = 40

exit_group(1)                           = ?

+++ exited with 1 +++

Looking at execve manual page, ENOENT is described like this :

ENOENT The file filename or a script or ELF interpreter does not exist, or a shared library needed for the file or interpreter cannot be found.

 

The executable is not a script. The file clearly exists. However, the ELF interpreter ?

 

When running readelf -l elf_executable you can see which interpreter the executable tries to use. In this case, the output is :

 

Elf file type is EXEC (Executable file)

Entry point 0x81b8

There are 5 program headers, starting at offset 52

 

Program Headers:

  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align

  PHDR           0x000034 0x00008034 0x00008034 0x000a0 0x000a0 R   0x4

  INTERP         0x0000d4 0x000080d4 0x000080d4 0x00013 0x00013 R   0x1

      [Requesting program interpreter: /usr/lib/libc.so.1]

  LOAD           0x000000 0x00008000 0x00008000 0x00208 0x00208 R E 0x1000

  LOAD           0x000208 0x00009208 0x00009208 0x000cc 0x000cc RW  0x1000

  DYNAMIC        0x000208 0x00009208 0x00009208 0x000a0 0x000a0 RW  0x4

 

Section to Segment mapping:

  Segment Sections...

   00    

   01     .interp

   02     .interp .dynsym .dynstr .hash .gnu.version .gnu.version_r .rel.plt .plt .text

   03     .dynamic .data .got

   04     .dynamic

The line Requesting program interpreter shows which ELF interpreter the system tries to use. On this system /usr/lib/libc.so.1 does not exist, which is why executing the program returns "File not found".

However, on a Linux system for example, the standard ELF interpreter is not libc but ld.

 

The solution

Define the system dynamic linker as the ELF interpreter using the --dynamic-linker directive when linking the program.

 

When linking programs for Linux ARM systems :

armv7a-hardfloat-linux-gnueabi-ld.gold --hash-style=sysv --dynamic-linker=/lib/ld-linux-armhf.so.3 -o test test.o -lc

When linking programs for Android ARM systems :

armv7a-hardfloat-linux-gnueabi-ld.gold --hash-style=sysv --dynamic-linker=/system/bin/linker -o test test.o -lc

If someone offered to sell your business server-compute cycles for one-tenth the going price, you might think there was a catch. But in this case you’d be wrong.

Nathan Goulding is the senior vice president of engineering of bare metal server startup, Packet, and that’s precisely what he and his colleagues are offering thanks to ARMv8-A powered technology. packet-team-loves-hardware.jpg

“You can now provision hourly on-demand ARMv8 servers powered by 2 x 48 Cavium ThunderX SoCs (Cavium) on Packet,” Goulding (pictured far right) wrote in a recent Packet blog post. “We’re starting with our EWR1 home (New York metro), as well as Sunnyvale and Amsterdam. We’ll add in Tokyo in early December when the facility opens for production customers.”

The price? $0.50 per hour per server, or $0.005 per core per hour. (Yes, you’re reading that correctly).

Packet’s offerings are single-tenant bare metal servers rather than virtualized slices. They have started with Ubuntu 16.04 and CentOS 7, but have CoreOS and FreeBSD in the works, along with IoT-focused OS flavors. Four different ARM server configurations are also coming.

“Making a powerful, low-cost ARM compute node available to developers has been a dream of mine for years,” said Zachary Smith, Packet CEO (pictured center front, dark sport coat).

The ARMv8-A architecture powering Packet’s bare metal Type 2A is designed for high-density data center workloads for tasks such as container-based applications, data processing, threaded application workloads and network heavy functions including load balancing.

 

What do users want from bare metal?

Packet sees a number of use cases emerging from beta users. These includes docker containers and network applications such as load balancing, and it has thrown open its virtual doors for other suggestions from the community, Goulding notes.

Coincidentally, Packet received $9.4 million in Series A funding from SoftBank Group earlier this year. The company’s founders were impressed with the famously long-term vision often offered by SoftBank CEO and founder Masayoshi Son.

“SoftBank certainly isn’t your normal corporate venture arm, and we aren’t your normal venture investment, or so it seemed,” wrote Smith in a blog. “I guess you could say that the 300-year plan of SoftBank includes lots of computers doing lots of things, and they’re not all virtualized!”

Goulding writes that he and his team have been thinking about this type of design for some time.

“We’ve been asked several times for our thoughts related to ARM chips in the data center.  Well, we think it matters. Big time,” he said. “So much so that we’ve been hard at work on an ARMv8 server solution for nearly a year — well before Mr. Son flew to Turkey to make his first offer to acquire ARM Holdings!”

 

Related stories:

--Useful tips for developing secure software on ARMv8-M

--14 ARM-based server SKUs by Gigabyte and Cavium, CodeWeavers will introduce CrossOver (News roundup 19 July 2016)

two cores.png

 

IoT refers to thousands of diverse market segments that are now using connected intelligent devices: from IoT nodes with rich user interfaces, to tiny sensors that are powered only by energy that is harvested from the environment. These small IoT nodes are where ARM®’s Cortex®-M processors come in.

 

At ARM TechCon 2016, ARM introduced two new members of the Cortex-M family: the Cortex-M23 and the Cortex-M33. For this blog, we will be focusing on the Cortex-M33, but you can find more information about the Cortex-M23 in this blog by thomasensergueix: Cortex-M23 and Cortex-M33 – a security foundation for billions of devices and this one: Five key features of the ARM Cortex-M23 Processor.

 

Accelerator blocks vs. a dedicated co-processor interface

One of the key features of the Cortex-M33 is the dedicated co-processor interface . This is the first Cortex-M profile processor to offer such an interface that extends the processing capability of the CPU. Why now, when chip designers have been adding accelerator blocks to Cortex-M based designs for the past ten years?

 

A co-processor, or accelerator block , addresses the need for the optimal balance between throughput and power consumption. An example to illustrate this point is a cryptography accelerator. Encryption functions can be done in software, of course, but at some point it is more energy efficient to do the function in hardware. A complete packet is sent to the accelerator for encryption before transmission.

 

This is a packet-based operation that happens occasionally. In this scenario, the cryptography accelerator performs its task properly when connected to the system bus. There is no need for the special situation of tight coupling with the processor.

 

The sensor-rich nature of IoT is now driving the need for a tightly coupled co-processor, rather than a bus based accelerator block. There is a group of applications that require frequent intensive compute operations; they benefit from being tightly coupled with the processor, as opposed to remaining as a block on the system bus.

 

corpo.png

 

This new Cortex-M33 co-processor interface is designed to:

  • Make it easy for the designer to integrate tightly-coupled co-processors
  • Remove the need of revalidating the processor design due to the addition of extensions
  • Maintain full ecosystem and toolchain compatibility
  • Avoid ecosystem fragmentation

 

The co-processor interface enables accelerator capacity to suit all applications

The interface offers 32 and 64-bit data read and write operations and has the advantage over the traditional AHB system interface in that accelerator access speed is much faster - it does not require any instructions to setup an address. By connecting the frequently used accelerators to the dedicated interface, the bandwidth loading on the system bus is dramatically reduced. With the ability to connect up to eight co-processors to the interface, designers can add a wide set of accelerators to suit all applications.

 

flexicompute.pngOne example is a smart sensor which would use specialised filtering to process the sensor data. These can be processed in software within the CPU, but for frequent, complex operations, custom acceleration integrated via the co-processor interface can be faster and more efficient.

 

Flexibility without compromising security

A blog about the Cortex-M33 is never complete without a mention of security which is essential for the growth of deployments across the diversity of IoT markets. The Cortex-M23  and Cortex-M33 are the first processors based on the ARMv8-M architecture; they bring ARM TrustZone® security to even the smallest of embedded devices. Both processors offer enhanced memory protection and debug, with dedicated protection for both Secure and Non-secure areas. The following is a diagram of the key features of the Cortex-M33. For more details please see this other blog.

 

Cortex-M33 slide.png

Summary

For certain applications, tightly coupled special-purpose compute accelerators can make a dramatic difference for power and performance.

 

It is essential that this is done in a way that maintains all of the benefits of the world’s #1 ecosystem with the widest choice of development tools, compilers, debuggers, operating systems, and middleware. The ARM ecosystem saves developers time and cost, and increases productivity.

 

The Cortex-M33’s new highly-efficient co-processor interface enables custom acceleration to be tightly coupled with the processor thus extending the capabilities of the processor for these specific functions. And crucially, it does this without fragmenting the ecosystem.

 

Stay tuned for the next blog in the ARMv8-M series where Thomas Lorenser will discuss DSP operations for Cortex-M33.

Concurrent with ARM's announcement of it's new Cortex®-M23 and Cortex-M33 processors, Synopsys announced broad tool support for building secure, efficient IoT designs based on these new CPUs. The Cortex-M23 is the smallest and most energy-efficient ARM processor and the Cortex-M33 is optimized for deterministic real-time microcontroller class processor operations; they are the first in the new ARMv8-M architecture, which introduces ARM TrustZone® security to its M-class cores.

 

Cortex-M23 chip diagramCortex-M33 chip diagram

            

 

Here are the highlights from our announcement:

  • Synopsys' Galaxy Design Platform tools, including IC Compiler II place-and-route, Design Compiler Graphical synthesis and Custom Compiler custom design solution, enable power- and cost-efficient implementation of IoT designs
  • Verification Continuum Platform technologies accelerate architecture optimization, software development, hardware/software integration and bring-up, system validation, design verification and debug
  • Software Integrity Platform, including Coverity, Defensics and Protecode tools, helps build-in software security and quality

See you in Salt Lake City…

Heading into the year’s biggest computing event, folks will have noticed that 2016 has been a year of big news for ARM in HPC.  Looking back on 2016, we saw the official launch of our HPC developer web-site: arm.com/hpc. This site showcases the latest developments and supported packages for the ARM HPC software ecosystem.  It’s a key starting point for ARM partners, HPC developers, and end-users. 

 

At ISC16 in Frankfurt in June, Fujitsu and RIKEN publicly unveiled plans for their future Exascale computer to be based on the ARMv8-A architecture. This took many by surprise and it was great to see such highly respected technology leaders and scientists commit to ARM and its burgeoning ecosystem.   Although the first Exascale supercomputer in Japan isn’t set to launch until at least 2021, collaboration with ARM is already underway.  This collaboration was highlighted in August during the HotChips event, where ARM announced its new SVE technology for HPC. An optional extension to the ARMv8-A architecture, SVE breaks barriers when it comes to SIMD vector width, flexibly allowing implementations of up to 2048-bit vectors.  

 

Now, looking ahead to SC16, we expect to share much more on both our HPC software ecosystem, as well as many more details on support for SVE.  ARM, having both a commercial and an open-source view for our HPC software ecosystem, we will be showing off our new commercial compilers and HPC tools as well as touting open-source advances over the past year.  We’ll also be discussing the launch of our initial OpenHPC release for the ARMv8-A architecture.  This will be the first non-x86 build supported and it will align with the planned OpenHPC v1.2 release that is due out in the SC16 timeframe.

ARM Booth #4033 at SC16

At the SC16 exhibition, we’ve got a great location this year in the Salt Palace Convention Center, and we have some great demos to match, highlighting the very latest ARM ecosystem and hardware platform advances.

Demos:

  • The ARM Mancunians will be in the city and united as they show off several new software products and updates for HPC.   This includes new compilers for HPC, as well as a new SVE Emulator and updates to the ARM Performance Libraries.
  • A small startup with a Big Data vision, Kaleao, will be showing off their new KMAX server with built in FPGA support for datacenter, HPDA, and HPC workloads.
  • Long-time ARM partner Softiron will be on-hand to demo their new high-density HyperDrive software defined storage appliance.

Ecosystem Meetings

Besides hosting a booth and numerous private partner and end-user meetings, we’ll also be hosting
a few public events relating to the ARM HPC software ecosystem.

ARM HPC User Group Meeting – November 14th, 2016
ARM will be hosting its annual HPC User Group Meeting at the Sheraton Salt Lake City Hotel on Monday afternoon of SC week, from 1pm – 5pm (plus an hour of free drinks, snacks, and discussion afterward).  Please click here to see the latest HPC User Group meeting agenda and to register for this free event: https://developer.arm.com/hpc/latest-news/events

OpenHPC TSC Meeting – November 16th, 2016

ARM will be hosting an OpenHPC Technical Steering Committee meeting at SC16.
Meeting will be located in the Wasatch room at the Sheraton Salt Lake City Hotel, about 2 blocks from
the Salt Palace convention center.  It will be held Wednesday from 9am – 10am.

OpenHPC BoF Session – November 16th, 2016

For those interested in details on OpenHPC efforts, we will also be attending the OpenHPC Community BoF session at SC’16:  http://openhpc.community/attend-the-openhpc-community-bof-at-sc16/

OpenSHMEM BoF Session – November 15th, 2016

For those interested in the latest on OpenSHMEM, ARM will be hosting a session from 2 – 4pm on Tuesday of SC16 week at the Sheraton Salt Lake City Hotel, Wasatch room. 
Here’s more info: http://openshmem.org/site/sc16bof

 

ARM Partners Exhibiting @ SC16

 

Advanced Micro Devices, Inc.1431
Allinea Software1508
AMD1431

ARM

4033
Cavium, Inc.4057
Cray Inc.1731
E4 Computer Engineering4630
Fujitsu Limited831
GIGABYTE344
Hewlett Packard Enterprise1531
Lenovo2643
MathWorks2517
Mellanox Technolgies2631
Numerical Algorithms Group (NAG)1522
NVIDIA2217, 2231
OpenHPC643
Penguin Computing817
RapidIO.org4245
Red Hat621
Rogue Wave Software2425
SUSE4427
Xilinx3640

ARM® TrustZone® technology is a system-wide approach to security for system-on-chip (SoC) designs. It is hardware-based security built into the heart of CPUs and systems and used by semiconductor chip designers who want to provide security to devices, such as root of trust. TrustZone technology is available on any ARM Cortex®-A based system, and now with the Cortex-M23 and Cortex-M33 processors, it is also available on the latest Cortex-M based systems (see Thomas Ensergueix blog on Cortex-M23 and Cortex-M33 - Security foundation for billions of devices). It is now possible to design in security, from the smallest microcontrollers, with TrustZone for Cortex-M processors, to high performance applications processors, with TrustZone technology for Cortex-A processors.

 

TrustZone.png

TrustZone security enables separation of shared resources between trusted and non-trusted

 

At the heart of the TrustZone approach is the concept of secure and non-secure worlds that are hardware-separated from each other. Within the processor, software either resides in the secure world or the non-secure world; a switch between these two worlds is accomplished via software in Cortex-A processors (referred to as the secure monitor) and by hardware (core logic) in Cortex-M processors. This concept of secure (trusted) and non-secure (non-trusted) worlds extends beyond the CPU. It also covers memories, on-chip bus systems, interrupts, peripheral interfaces and software within an SoC.

 

TrustZone technology for ARMv8-M processors (Cortex-M)

The ARMv8-M architecture extends TrustZone technology to Cortex-M class systems, enabling robust levels of protection at all cost points. TrustZone for Cortex-M is used to protect firmware and peripherals, as well as providing isolation for secure boot, trusted update and root-of-trust implementations. The architecture maintains the deterministic real time response expected of embedded solutions. Context-switching between secure and non-secure worlds is done in hardware for faster transitions and greater power efficiency. There is no need for any secure monitor software since the processor itself performs the switching which reduces the memory foot print and dynamic power for code execution.

 

Before we move to programming, there are a few concepts that we need to cover first:

  1. Security defined by address
  2. Additional states
  3. Cross-domain calls

Concept number 1: Security defined by address

The first key concept to grasp is that each address is associated with a specific security state. The processor uses a newly-introduced security attribution unit to check the security state of the address. A system-level interface may override that attribution based on the overall SoC design. After the state is selected, the address also passes through the appropriate memory protection unit, if present in the system.

 

security by address.png

Security defined by address

 

Concept number 2: Additional states

The second key concept is the presence of additional execution states. The ARMv7-M and ARMv6-M architecture defined two execution modes: handler mode and thread mode. Handler mode is privileged and may access all resources of the SoC, while thread mode could be either privileged or unprivileged. With the TrustZone security extension, the processor modes are mirrored to form secure state and non-secure state, each has a handler mode and a thread mode. The security states and processor modes are orthogonal, resulting in four combinations of states and modes. When running software in secure memories, the processor is automatically set to secure state. Likewise, when the processor is executing software in non-secure memories, the processor is automatically set to non-secure state. This design removes the need for any secure monitor software to manage the state switch which reduces the memory foot print and power consumption as detailed above.

 

states.png

Additional orthogonal states

 

Concept number 3: Cross domain calls

ARMv8-M was designed for the Cortex-M profile with deterministic real time operations. Any function in any state may call any other function in the other state directly, as long as certain rules are respected based on the predefined security state entry points. Also, as expected, each state has a distinct set of stacks and stack pointers to those stacks in order to protect assets on the secure side. The function call overhead is dramatically reduced since there is no need for an API layer to manage the calls. Given predefined entry points, the call proceeds directly to the called function.

 

calls.png

Cross domain calls

Simplified use case

The diagram below shows one simple use case where the user application and the I/O driver are in the non-secure state, whereas the system startup code and a communication stack are in the secure state. The user application calls into the communication stack to transmit and receive data whereas that stack will use the I/O driver in the non-secure state to transmit and receive over the interface.

simple.png

Example software configuration taking advantage of the TrustZone Secure stateAs in all such systems:

  • Non-secure applications do not have access to secure resources unless going through properly-defined secure service function entry points
  • Secure firmware can access both secure and non-secure memories
  • Secure and non-secure code may implement independent time scheduling using different timers
  • Each interrupt line can be programmed to be secure or non-secure. The vector tables for secure and non-secure software are also separated.

 

 

Although the processor hardware provides essential protection for the secure software, secure software needs to be written carefully to ensure that the whole system is secure. Below are three key items that software developers should remember when creating secure software:

  1. Utilise new ARM C Language Extension (ACLE) features
  2. Validate non-trusted pointers
  3. Design for asynchronous non-secure memory modifications

 

Tip number 1: Utilize new ARM C Language Extension features

TrustZone for ARMv8-M introduced a few new instructions to support the security state switching. Instead of creating assembly wrappers to generate those instructions, software developers should utilize new compiler features defined in ARM C Language Extensions (ACLE) to allow software tools to understand the security usage of the functions and generate the best code required. The ACLE features are implemented by multiple compiler vendors, and hence, the code is portable.

 

For example, when creating a secure API that is callable from the non-secure state, a new function attribute called “cmse_nonsecure_entry” should be used in declaring the function. At the end of the function call in the secure state, the registers inside the processor may still contain secret information. With the correct function attribute, the compiler can automatically insert code to clear registers in R0-R3, R12 and Application Program Status Register (APSR) that might contain secret information, except if the registers are used to return the result to non-secure software. Register R4 to R11 are handled differently, as their contents should be unchanged at function boundaries. If their values are changed in the middle of the function the values should be restored to original values before returns to non-secure caller.

 

Tip number 2: Validate non-trusted pointers

There will be cases where the non-secure code may provide incorrect pointers by design to try to gain access to secure memory. To counter that possibility, a new instruction has been introduced in ARMv8-M, the Test Target (TT) instruction. The TT instructions return the security attributes of an address, so secure software can determine if a pointer is pointing to a secure or non-secure address.

 

To make the checking of pointers more efficient, there is an associated region number for each memory region defined by the security configuration. This region number can be utilized by software to determine if a contiguous range of memory has similar security attributes.

The TT instruction returns the security attributes and region number (as well as MPU region number) from an address value. By using a TT instruction on the start and end addresses of the memory range, and identifying that both reside in the same region number, software can quickly determine that the memory range (e.g. data array or data structure) is located entirely in non-secure space.

 

TT diagram.png

Checking pointers for valid region boundaries

 

 

Using this mechanism, secure code servicing APIs in to the secure side can determine if the memory referenced by a pointer from non-secure software has the appropriate security attribute for the API. This prevents non-secure software from using APIs in secure software to read out or corrupt secure information.

 

Tip number 3: Design for asynchronous non-secure memory modifications

Non-secure interrupt service routines could change non-secure data that is being processed by secure software. Thus, input data that has already been validated by the secure API can be changed by a non-secure ISR after the validation step. One way to avoid that situation is to make a local copy of that input data in secure memory and use the secure copy for processing (including validation of the input data) and avoid the read to non-secure memory. In the cases where such copying is not desired (e.g. when dealing with large amount of data in a specific memory region), then the alternative is to program the security attribution unit to make that memory region secure.

 

Summary

It is up to developers of software for the secure side to make sure that whole system is secure and that no secure data leaks to the non-secure side. In order to accomplish this, three key TrustZone concepts were outlined along with three key items thats secure software developers can use to help create secure systems. The ACLE techniques to protect data in registers of called functions. The TT instruction for validating pointers and finally, that developers need to keep in mind that the non-secure side may change data by interrupting the secure-side. To explore further, a selection of documents are available to help in developing secure firmware on ARMv8-M processors.

Corporate_deck_assets_slide_4.jpg

 

The ARM®v8-A architecture continues to evolve, with the additions developed through 2016 collectively known as ARMv8.3-A. Grouping enhancements in this manner helps the ecosystem manage tools and software support alongside the large numbers of ARMv8-A based processors and products in development or production today. These changes add to the gradual migration in cores and related products over several years.

 

Developed in collaboration with our architecture licensees and other key partners, ARMv8.3-A adds:

  • A mechanism for enhanced security associated with pointer authentication
  • Additional controls and adjustment to the exception model for nested virtualization
  • A range of small-scale enhancements to the instruction set and System register support in a variety of areas

 

All these changes are incremental to previous sets of enhancements, with the ARMv8-A System register ID mechanism used to identify features in any given implementation.

 

Please note: ARM recently announced support for a new vector processing architecture, the Scalable Vector Extension (SVE). This extension is independent of the changes introduced with ARMv8.3-A. See Technology Update: The Scalable Vector Extension (SVE) for the ARMv8-A architecture for more details.

 

ARMv8.3-A overview

The enhancements introduced with ARMv8.3 fall into the following categories:

  • Pointer authentication

(AArch64 only)

  • Nested virtualization

(AArch64 only)

  • Advanced SIMD complex number support

(AArch64 and AArch32)

  • Improved Javascript data type conversion support

(AArch64 and AArch32)

  • A change to the memory consistency model

(AArch64 only)

  • ID mechanism support for larger system-visible caches
(AArch64 and AArch32)

 

Note: AArch64 indicates the 64-bit Execution state and AArch32 the 32-bit Execution state in the ARM architecture.

 

Pointer authentication

Computer attacks are becoming more sophisticated. Examples of this are exploit mechanisms such as the use of gadgets in Return-Orientated-Programming (ROP) and Jump-Orientated-Programming (JOP). To mitigate against such exploits, ARMv8.3-A introduces a feature that authenticates the contents of a register before it is used as the address for an indirect branch or data reference. For address authentication, the functionality uses the upper bits in a 64-bit address value normally associated with signed extension of the address space. This allows the introduction of a Pointer Authentication Code (PAC) as a new field within the upper bits of the value.

 

The functionality is summarized as follows:

  • Instructions are added for:
    • PAC value creation that write the value to the uppermost bits in a destination register alongside an address pointer value
    • Authentication that validate a PAC and update the destination register with a correct or corrupt address pointer. If the authentication fails, an indirect branch or load that uses the authenticated, and corrupt, address will cause an exception.
    • Removing a PAC value from the specified register
  • An implementation can create a PAC using a standard and/or proprietary algorithm
  • The standardized form uses a recently published block cipher known as QARMA. Click here for information on QARMA

 

Nested virtualization

There is growing interest in cloud computing, and, in particular, in an increasingly common use case where a user rents a virtual machine from an Infrastructure as a Service (IaaS) provider. Nested virtualization is an attractive proposition where the workload to run on this virtual machine includes the use of a hypervisor.  In this blog, the hypervisor that is run natively on the hardware is described as the host hypervisor, while the nested hypervisor that is run under the control of the host hypervisor is described as the guest hypervisor.

 

The ARMv8.3-A nested virtualization support enables a guest hypervisor to run transparently in non-secure EL1 mode, unaware that it is not executing at EL2. Running a guest hypervisor at EL1, removes the exception trap overhead, performance, and latency costs of running this software as a non-secure user-level process. This feature is only supported in AArch64, and requires implementation of EL2.

 

Advanced SIMD floating-point complex number support

New instructions are added to AArch32 and AArch64 to aid floating-point multiplication and addition of complex numbers, where the complex numbers are packed in a vector register as a pair of elements. The Imaginary part of the number is placed in the more significant element, and the Real part of the number is placed in the less significant element.

 

The instructions include:

  • An optional rotation (when considered in polar representation) of one of the arguments by 0, 90, 180, or 270 degrees
  • Single-precision and double-precision data types, the latter only with AArch64 execution
  • Half-precision data type support that is only implemented if the half-precision floating-point instructions defined in ARMv8.2-A are implemented; otherwise, the half-precision encodings are UNDEFINED

 

The floating-point functionality supported is:

  • Complex number signed multiply and accumulate
  • Complex number signed addition

 

Improved Javascript data type conversion

Javascript uses the double-precision floating-point format for all numbers. However, it needs to convert this common number format to 32-bit integers in order to perform bit-wise operations. Conversions from double-precision float to integer, as well as the need to check if the number converted really was an integer, are therefore relatively common occurrences.

 

ARMv8.3-A adds instructions that convert a double-precision floating-point number to a signed 32-bit integer with round towards zero. Where the integer result is outside the range of a signed 32-bit integer (DP float supports integer precision up to 53 bits), the value stored as the result is the integer conversion modulo 232, taking the same sign as the input float.

 

The Z-flag is used to determine if the original number was an integer; the other flags (N, C, and V) are always cleared. The Z-flag is set to one to indicate an integer within range, meaning it is cleared when the input number is:

  • An infinity
  • A NaN
  • Too large for a 32-bit signed integer
  • -0
  • not an integer value, and rounded accordingly

 

This approach allows a B.NE conditional branch to be used immediately after this instruction to test if the input double-precision number is a true representation of a 32-bit signed integer.

 

Memory consistency model

The ARMv8.0 support for release consistency is based around the “RCsc” (Release Consistency sequentially consistent) model described by Adve & Gharacholoo in [1], where the Acquire/Release instructions follow a sequentially consistent order with respect to each other. This is well aligned to the requirements of the C++11/C11 memory_order_seq_cst, which is the default ordering of atomics in C++11/C11.

 

Instructions are added as part of ARMv8.3-A to support the weaker RCpc (Release Consistent processor consistent) model where it is permissible that a Store-Release followed by a Load-Acquire to a different address can be re-ordered. This model is supported by the use of memory_order_release/ memory_order_acquire /memory_order_acqrel in C++11/C11.

 

[1] Adve & Gharachorloo Shared Memory Consistency Models: A Tutorial

 

Support for larger architected caches

The Current Cache Size ID Register (CCSIDR) defines the number of sets of a cache level by using a 15-bit field, and the associativity and number of ways in a 10-bit field. To avoid one or both of these becoming limiting factors in an implementation, a second 32-bit register, CCSIDR2, is added and a new format adopted across the 64 bits provided by the existing and new registers.

 

To conclude

For a summary of the ARMv8-A architecture, see the section on ARMv8 architectural concepts in Chapter A1 of the ARMv8-A Architecture Reference Manual. This document, ARM DDI 0487, can be downloaded from the developer.arm.com

 

ARMv8.1-A details are currently available as a supplement. Their consolidation alongside the ARMv8.2-A details will be published in early 2017.

 

It is expected that the ARMv8.3-A details will be consolidated into the ARMv8-A Architecture Reference Manual and published in mid-2017.

 

David Brash is Architecture Program Director in the Architecture and Technology Group, one of several groups within ARM’s engineering community.

A great deal of cyber ink has been spilt on the subject of Internet of Things - on the promise, the hype, the frustration that it’s not quite there yet. The entire concept of IoT combines sensory, connected embedded intelligence with an enhanced learning in the cloud to deliver increasingly intelligent services in increasingly diverse fields.

 

This connected embedded intelligence is already an integral part of people’s lives at various levels. At a personal level – where devices or widgets can do substantially more insightful analysis. At an industrial level, with smarter control and automation. And finally, at a societal level – where it’s not just smarter cities and buildings, but a capture of anonymous - yet relevant - data that can help point to trends, efficiencies to make all our lives better.

 

IoT runs on ARM (without text).png

15 billion ARM-based chips shipped from sensors to servers in 2015

 

IoT today, runs on ARM®. Over 15 billion ARM-based chips shipped last year, from all points in that spectrum, from the sensor to the server. For the IoT experience to deliver its its next transformative shift, the industry needs better efficiency, greater security and the ability to deliver scale for every portion of the supply chain - from device to software and service.

 

ARM has introduced the industry’s most comprehensive offering of scalable, secure, efficient technology for the next phase of the IoT, enabling fast and secure deployment from chip to cloud. ARM and its ecosystem are ready to deliver the breadth of technology that will change billions of lives for the better.

 

Introducing a faster route to a secure IoT from chip to cloud

 

ARM’s new IoT technologies (announced here) work seamlessly together for rapid deployment IoT solutions and services. This is particularly in constrained nodes, where deployment often outpaces that of the smartphones they often connect to.

 

This consists of a number of firsts:

 

The new Cortex-M processors – Cortex-M33 and Cortex-M23

 

Licensed by the majority of the top 10 global MCU suppliers and backed by the world’s #1 embedded ecosystem, Cortex-M23 and Cortex-M33 are set to become the processors of choice for microcontrollers. These are the first processors to be based on the ARMv8-M architecture, and bring TrustZone security to even the smallest of Embedded devices, as thomasensergueix shares in Cortex-M23 and Cortex-M33 - Security foundation for billions of devices. TrustZone for ARMv8-M provides a standard on which secure software and secure debug solutions can be more easily developed, further enhancing the range of security software and tools available within the Cortex-M ecosystem. Learn more about TrustZone technology.

 

Cortex-M33: Efficiency, security and flexibility

Cortex-M33 launch slide.png

The general-purpose 32-bit MCU processor of choice for secure embedded applications

 

The Cortex-M33 core will be the general-purpose 32-bit MCU processor of choice for secure embedded applications. It is 80% smaller than the ARM Cortex-A5, the smallest TrustZone-capable processor before now. It is also highly versatile – a single processor, spanning wide capabilities. It provides configurable support for TrustZone, DSP, and floating point computation to support advanced audio and connectivity stacks. Its new co-processor interface enables tightly-coupled custom processing to be added, while retaining all of the benefits of a vibrant tools ecosystem. diyasoubra explains more here: Five key features of the ARM Cortex-M33 Processor.

 

Cortex-M23: Security in the smallest footprint

Cortex-M23 launch slide.png

Built for small, energy-sipping IoT and embedded products

 

Cortex-M23 is even smaller - in fact, 75% smaller than the new Cortex-M33. It is also 50% more efficient than Cortex-M33, meaning that it can run on even less energy and for even longer. It is built for small, energy-sipping IoT and embedded products. It extends the capability of the smallest, lowest-power devices, providing security, enhanced efficiency, performance and scalability for deployment even in the most constrained contexts. Tim Menasveta explains more here: Five key features of the ARM Cortex-M23 Processor

 

CoreLink SIE-200: System IP for embedded

CoreLink SIE-200 launch slide.png

Providing manufacturers with a single solution that can be used across multiple markets

 

Security requires not just a CPU, but a system solution. CoreLink SIE-200 system IP for embedded provides IP blocks built on top of the AMBA® 5 AHB5 interface that extends TrustZone security to the system. The configurable interconnect and TrustZone controllers provide a hardware-enforced isolation between secure and non-secure applications and can support multiple system architectures, enabling designers to tailor each design to suit a specific application.

 

TrustZone CryptoCell-312: Enabling platform security

CryptoCell-312 launch slide.png

For fast boot times and a smooth, responsive end-user experience

 

CryptoCell-312 enhances the offering to make it a comprehensive security solution, serving a variety of use cases and allowing the supply chain to establish trust in a broad spectrum of power and area-constrained devices.

 

It augments TrustZone and is an order of magnitude faster and more secure than software-only based solutions, which are essential both for fast-boot times and for consuming less energy in these energy-sipping devices. Together with TrustZone, CryptoCell-312 can enable platform security, with capabilities such as true random number generation, key management, secure boot and various roots of trust. You can learn more about TrustZone CryptoCell technology in this webinar.

 

 

Cordio radio: Flexible, portable and design-ready

Cordio launch slide.png

Part of the ARM family of standards-based, low-power wireless IP solutions

 

ARM Cordio radio IP extends support for the new Bluetooth 5 standard and 802.15.4 on which the fast growing Zigbee and Thread devices are based. It’s a comprehensive solution - providing a low-power RF to software stack solution supported on multiple foundries and nodes. It’s a new, flexible, configurable architecture that enhances it’s signature energy-efficient sub-one-volt radio. philippe_bressy shares more in his blog ARM Cordio radio IP: Flexible Bluetooth 5 and 802.15.4 connectivity architecture for IoT edge devices.

 

 

Artisan IoT POP IP: Optimized implementations

IoT POP IP launch slide.png

 

Simplify SoC design and implementation; reduce time from silicon to device enablement

 

ARM IoT POP IP accelerates the implementation of IoT SoCs via physical IP and reference designs with the know-how to develop a design that has optimal performance with minimum area. Design teams will be able to balance the twin requirements of low leakage and dynamic power within the processor domain, always-on subsystem and the rest of the system. The IoT POP IP has been designed for use with Cortex-M33 and CoreLink SSE-200 at TSMC 40ULP, with an easy-to-use reference design that outlines the necessary physical IP, as well as layout suggestions for optimal power profiles.

 

 

CoreLink SSE-200 subsystem: Pre-integrated hardware and software

CoreLink SSE-200 launch slide.png

 

Everything a system developer needs to make configuration and implementation easier

 

Bringing it all together, the CoreLink SSE-200 subsystem for embedded is the foundation on which SoC designers will build a new generation of secure IoT products. It has been tested and pre-integrated the Cortex-M33 processor, CoreLink SIE-200, Cordio Bluetooth radio, TrustZone CryptoCell-312, Artisan IoT POP IP, as well as the mbed OS and Cordio radio software stacks.

 

To ensure fast integration of the subsystem in a SoC, a set of scripts and manuals make configuration and implementation easier. Since CoreLink SSE-200 is a fully verified IP, there is no need to spend expensive verification time on checking its internal behaviour; designers can focus on the design tasks that add real value to their application.

 

 

Scaling connected deployment: The device management conundrum

 

As IoT projects scale further, IoT players – 86% of them – believe that device management, i.e. the detection, connection, provision and operation devices at different times in their product lifecycle, is a key obstacle to success. ARM’s familiarity with end devices and the embedded ecosystem puts us in a unique position to solve this problem. Built on the virally successful ARM mbed IoT Device Platform, mbed Cloud is a device-side cloud, independent of analytics, offering the capability to securely manage any device with any data cloud.

 

mbed Cloud consists of packages that allow developers to simplify the challenges in securely connecting, provisioning and updating devices from end-to-end.

 

mbed Cloud launch slide.png

Over 1 million compiles per month and a developer community of 200,000 globally

 

mbed Cloud allow enterprises to:

  • Connect the devices together irrespective of a particular IP connectivity technology. Being standards-based, it implements CoAP, LWM2M+ (plus) and provides additional optimizations for efficient caching of devices in the networks
  • Identify and trust devices across different stages of their lifecycle
  • Orchestrate how different trusted parties and devices can access sensor data
  • Simplify how devices can be updated across mesh or star networks with firmware
  • Ensure that such updates are done in a fail-safe and energy-efficient manner

 

Device-side capabilities are enhanced with mbed OS 5, the new Platform OS built for IoT that brings 10x increase in developer productivity. It boast of over 1 million compiles per month and a developer community of 200,000 globally. Discover more about the mbed Cloud here.

 

Get latest version of mbed OS v5.2 on www.mbed.com.

 

IoT Unbound

 

ARM’s IoT offering allows start ups, OEMs, service providers and even experienced Silicon vendors an optimized, secure starting point that mitigates risk and accelerates time to market. The mbed Cloud offering delivers a similar critical ingredient service platform on which a lot of value added IoT services can be delivered. Just like TrustZone added security with a standard API simplifying and broadening the use of security, similarly TrustZone for ARMv-M and mbed Cloud deliver platform to scale solutions and services to deliver on the promise of IoT

 

Find out more!

 

I’ve mentioned each of these new technologies briefly, but watch this space over the coming weeks and months as we go into more detail on how each one is well-suited to delivering a secure IoT.

System-on-chip (SoC) solutions based on ARM® Cortex® processors address diverse embedded market segments, including: Internet of Things, motor control, healthcare, automotive, home automation, and many more, as you can see in this blog by Thomas Ensergueix. The various processors provide a standard architecture to address the broad performance spectrum and cost range required by these diverse product markets. The Cortex family is based on three distinct profiles: the A profile, for sophisticated, high-end applications running mainly complex operating systems; the R profile, for high performance hard real-time systems; and the M profile, optimised for low-power, deterministic, cost-sensitive microcontroller applications.

 

The first two processors implemented using the ARMv8-M architecture are the Cortex-M23 and the Cortex-M33. For details on the Cortex-M23, please refer to this blog by Tim Menasveta. The Cortex-M33 is the first full-feature implementation of ARMv8-M with TrustZone® security technology and digital signal processing capability. The processor supports a large number of flexible configuration options to facilitate deployment in a wide range of applications, and offers a dedicated co-processor interface for accelerating frequently used, compute intensive operations. The Cortex-M33 delivers an optimal balance between performance, power, security and productivity.

 

Hero Chip_Cortex-M33.png

 

 

The Cortex-M33 processor has an in-order 3-stage pipeline, which dramatically reduces system power consumption. Most instructions complete in two stages, while more complex instructions require three. Some 16-bit instructions are dual-issued to boost performance. The core has two AMBA® 5 AHB5 interfaces: C-AHB and S-AHB, which are symmetric in nature and offer identical performance of instruction and data fetches.

 

The Cortex-M33 processor is highly configurable and is easily adapted to system requirements

Designers can quickly create powerful systems by including the most suitable combination of these optional MPU, DSP, FPU, TrustZone, ETM, MTB, ITM, BPU, DWT and co-processor interface features. In minimal control systems, the NVIC can be configured to have just one external interrupt, while in peripheral rich systems, the NVIC can be configured to support up to 480 external interrupts with up to 256 levels of priorities. In systems demanding more reliable operations of many active processes and threads, the MPU can be included to enforce process separation using privileged and unprivileged access control. For the next level of code, data and resource protection, TrustZone would be used.

 

Increasing complexity of applications make on-chip debug and trace invaluable to delivering products on schedule. The integrated debug capabilities of the Cortex-M33 processor allow for faster software verification. The system can be viewed through either a JTAG port or a 2-pin Serial Wire Debug port. The optional ETM and MTB provide excellent instruction trace capabilities, while the BPU and DWT provide the capability to use breakpoints and hardware watchpoints for debug.

 

CHIP_BLOCK_DIAGRAM__Cortex-M33.png
  • MPU Memory Protection Unit
  • DSP Digital Signal processing
  • FPU Floating Point Unit
  • SP Single Precision
  • ETM Embedded Trace Macrocell
  • MTB Micro Trace Buffer
  • BPU Break Point unit
  • DWT Data Watch and Trace Unit
  • ITM Instrumentation Trace Macrocell
  • NVIC Nested Vectored Interrupt Controller
  • WIC Wake-up Interrupt Controller
  • AHB Advanced High Performance Bus
  • AMBA Advanced Microcontroller Bus Architecture

 

Now, on to the five key features of the Cortex-M33:

 

1 - TrustZone for ARMv8-M; a foundation for system wide security

The Cortex-M33 processor with TrustZone has two security states and a number of associated features:

2states.png

Two new orthogonal states

 

  • Secure state
  • Non-secure state
  • Four stacks and four stack pointer registers
  • Hardware stack-limit checking
  • Support for programmable MPU-like Security Attribution Unit (SAU)
  • Interface for system security indication
  • Visibility of secure code from non-secure (NS) domain restricted to predefined entry points
  • Exception hardware automatically saves and clears secure register state when switching to non-secure
  • Extensive banking of interrupt or exception control, SysTick
  • Memory protection unit for each of the secure and non-secure side

 

The presence of two full states opens the door for many new opportunities and applications. High value proprietary firmware used by the system may be delivered in the secure state. Supervisor code placed in the secure state can be used to recover a system after an attack or unreliable operation, while the non-secure side remains available as before to the millions of developers currently developing software for Cortex-M.

 

2 - Co-processor interface for extensibility

For certain applications, special-purpose compute can make a difference. It is essential that this is done in a way that maintains all of the benefits of the world's #1 ecosystem – the widest choice of development tools, compilers, debuggers, operating systems, and middleware. The ARM ecosystem saves developers time, cost and increases productivity.

 

The Cortex-M33 processor includes an optional dedicated bus-like interface for the integration of tightly-coupled accelerator hardware. For frequently used compute intensive operations, this interface gives a mechanism to augment the general purpose compute capability with custom defined processing hardware. Crucially, it does this without fragmenting the ecosystem. The interface includes control and data channels for up to eight co-processors, with signals to provide information about the privilege and security state of the processor along with the instruction type, associated register and operation fields. The co-processor operations are typically expected to either complete in a reasonable small number of cycles or running in background and interrupt on completion. The operation details and data can be transferred via the interface at the same time with a single instruction, and wait states can be inserted if needed.

 

3 - Memory protection unit (MPU) for task isolation

The optional MPU is programmable and provides up to 16 regions for each of the secure and non-secure states.  In multi-tasking environments, the OS can reprogram the MPU during task context switching to define the memory access permissions for each task.  For example, a task of an application may be granted access to only some application data and specific peripherals. In this way, the MPU protects all other memories and peripherals from corruption or unauthorised access to dramatically improve system reliability.

 

Easier to setup memory regions

 

The Cortex-M33 memory protection architecture is based on the protected memory system architecture PMSAv8. This version adopts base and limit style comparators for regions as opposed to the previous power-of-two size, sized-aligned scheme. Each region has a base starting address, ending address, and settings for access permission and memory attribute. The result is that one can produce MPU regions without having to consider joining a number of regions together. This enhancement simplifies software development, encourages usage and reduces programming steps, which reduces context switch times.

 

4 - DSP Extension

The optional integer DSP extension adds 85 instructions. In most cases, the DSP instructions would increase performance by an average of three times, giving a boost to all applications that are centred around digital signal control.

To accelerate software development, ARM also deliver a free DSP library in the CMSIS project. The library contains a range of filter, transformation and maths functions (e.g. matrix), and support a range of data types. The CMSIS project is now open source and the development is published in github.

 

5 - Single precision floating point unit

The optional single precision floating point extension based on FPv5 includes an additional 16-entry 64-bit register file. The option adds 45 IEEE754-2008 compatible single-precision floating-point instructions. Using floating-point instructions usually yields an average of ten times increase in performance over the equivalent software libraries.

The FPU is contained in a separate power domain allowing the unit to be powered-down when not enabled or in use.

 

The Cortex-M33 delivers an optimal balance between performance, power and security

The complexity of Embedded solutions is rising dramatically and so is their value. Designers are faced with the task of finding the right balance between opposing design factors. The amount of software included in an SoC is also rising dramatically, while project schedules are shrinking. In order to deliver the right product, at the right time, with the right performance and cost, we need to start with the correct seed.

 

The Cortex-M33 processor was designed to be the seed of such designs, leveraging previous experience and the existing Cortex-M ecosystem to reduce development cost. System power is reduced, due to a new design with multiple low-power technologies. TrustZone sets the foundation to protect user applications and valuable IP for building secure solutions. The enhanced MPU and TrustZone combine to form the base for reliable and protected systems. Finally, we get to the endless pursuit of better productivity. TrustZone is designed such that all existing users may continue to develop in the non-secure zone, just as before. Debug and trace are enhanced in Cortex-M33 to simplify working with complex code. All programming may be done in C language, as is the case for all Cortex-M, including all exception handlers. In total, these features and functionality increase developer productivity and allow them to deliver more complex solutions in a shorter period.

 

Many silicon partners joined ARM in defining and developing these new processors, and are actively designing chips taking advantage of the TrustZone security technology. The ARM ecosystem is also focused on porting tools and software to the Cortex-M33.  While the Cortex-M33 delivers an optimal balance between performance, power, security and productivity, it is more important to state that the ARM partnership is working hard to deliver great ingredients to the developers and makers whose creativity and vision will fuel the fast transition to a more connected, more intelligent, and more protected world.

 

For further details, please check out this white paper.

October 19, 2004 was a date like any other, and will probably not mean much to most people. However, if you are part of the Embedded community, that precise date was transformational for the microcontroller (MCU) industry. It was the day that ARM® announced the first Cortex®-M processor, bringing the advantages of a common architecture to the microcontroller market.

 

Embedded developers quickly embraced the intuitive programmers’ model, as well as the outstanding performance and excellent energy efficiency of the Cortex-M family. Combining ease-of-use and far-reaching ecosystem support, it accelerated innovation in the embedded industry and multiplied the microcontrollers' use-cases. These tiny Cortex-M based MCUs are now everywhere, bringing invisible intelligence and enhanced functionality to many of the devices we use today. A decade later, mostMCU and embedded players have used this industry standard to ship a combined total of more than 22 billion units of Cortex-M based devices.

 

Cortex-M-Products.jpg

Embedded intelligence runs on Cortex-M based devices

 

Innovation requires security and standard platforms

 

Innovation never stops, and new challenges arise. Providing security for the increasing number of connected objects is now essential - protecting their data confidentiality, their functionality and integrity, as well as their connection from the infrastructure to the cloud. Not only must this security meet high standards using proven best practices; it must also be easy to use and program, minimizing the risk of being used incorrectly. The last and key ingredient of deployment success is that this innovation needs to build on an industry standard platforms, ensuring wide ecosystem endorsement and enabling a large community of developers to create the huge diversity of devices that will accelerate proliferation in the various Internet of Things vertical segments.

 

Securing connected devices is a well-known challenge - and opportunity - at ARM. There are more than 10 billion units of Cortex-A based chips deployed in mobile devices that use ARM TrustZone® technology to protect the root of trust from potentially distrustful software. ARM tasked some of its most talented engineers to optimize and transfer this security foundation into the very heart of a new version of the M-profile architecture. They have achieved this and ensured it fits within the tight embedded constraints:

 

  • Real-time, with fast transitions between security states
  • Deterministic
  • Still highly energy efficient.

 

The outcome, the ARMv8-M architecture, was unveiled last year at ARMTechCon 2015, promising to bring advanced software isolation into the smallest of processors and devices using ARM TrustZone for ARMv8-Mtechnology. If you are looking for more information on this new architecture, Joseph Yiu's great blog is the best place to get started!

ARMv8-M-TrustZone.png

TrustZone for ARMv8-M brings security to the smallest devices

 

Introducing Cortex-M23 and Cortex-M33

 

Today I am pleased to announce two new ARM Cortex-M processors built on TrustZone technology: the Cortex-M23, for the most area and energy constrained applications, based on the ARMv8-M Baseline profile; and the Cortex-M33, for the more capable systems, based on the ARMv8-M Mainline. Both profiles offer ARM TrustZone technology as their security foundation and provide an easier-to-use MPU programmers' model, with the capability to restrict debug visibility, thus protecting the secure software confidentiality. The security concept is holistic, it goes beyond processor boundaries and encompasses the complete system: bus/interconnect, memories and peripherals, exporting the processor security state across the system using the AMBA® AHB5 standard.

ARM Cortex-M33 and Cortex-M23 have TrustZone security built in at the foundation

 

 

Connected devices built on Cortex-M23 or Cortex-M33 based chips will benefit from the protection offered by the trusted world to execute security-critical functionality, such as secure boot, cryptography, identity and key management, provisioning and update of the devices. In the processor's normal world, guest applications and non-secure services will run similar to previous Cortex-M based devices. TrustZone will allow these applications and services to access the secure functionality of the trusted world, while safeguarding the secure resources from being misused, corrupted or inspected by guests. It is worth noting that due to the forward compatibility of the programmers' model, applications written for existing Cortex-M processors will run in Cortex-M33’s and Cortex-M23’s non-secure worlds, without noticing that they are running on an ARMv8-M based processor. Experienced Cortex-M developers will feel at home and will be able to quickly transfer existing applications to the next wave of microcontrollers.

 

You can find more details of the new processors in the very instructive blogs from Tim Menasveta and Diya Soubra, respectively on Cortex-M23 and Cortex-M33.

 

Accelerating the pace of development

 

As well as TrustZone and the related security features, both processors bring the additional capabilities of the respective ARMv8-M Baseline and Mainline profiles. They offer more aligned interfaces and features to chip designers and software developers, such as: debug, memory-sharing and execute-only memory support, and increased maximal number of interrupts. Altogether, these make system design and software development more scalable and efficient and accelerate the pace of development - essential for the proliferation of billions of IoT nodes.

 

Many partners joined us in defining and developing these new processors, and are actively designing chips that take advantage of the same standard TrustZone security technology. We are thrilled that seven of them are joining us for the launch at ARM TechCon 2016.

Public lead partners.png

The majority of the world's top 10 MCU suppliers have licensed the new ARM technology

 

The ARM ecosystem is already focused on porting tools, RTOS and firmware to be ready for when first chips arrive. In addition, ARM’s software development tools are available and fully support the Cortex-M23 and Cortex-M33 processors. Many ecosystem partners will be showcasing at ARM TechCon how their product makes the most of the ARMv8-M architecture and how the Cortex-M23 and Cortex-M33 processors unlock new capabilities.

 

ARMv8-M Ecosystem.png

The world's #1 ecosystem is already migrating to ARMv8-M

 

Will the 25th of October 2016 mark another breakthrough in embedded? It will probably take several more years and billions of shipped devices before one can say for sure. However, the key focus for now is to work in close collaboration with the ARM partnership to deliver all the great ingredients to the developers and makers whose creativity and vision will fuel the fast transition to a more connected, more intelligent and more protected world.

M23.png

 

 

ARM® Cortex®-M23 is the smallest and most energy efficient processor with TrustZone® technology. Based on the ARMv8-M baseline architecture, Cortex-M23 is the ideal processor for constrained embedded applications where efficient security is a key requirement.

 

My colleague Thomas Ensergueix introduces the newest members to the Cortex-M family in his blog Cortex-M23 and Cortex-M33 - Security foundation for billions of devices , and here I will take you through some of the most interesting parts of the new Cortex-M23 processor:

 

  • The number one most important feature of the Cortex-M23 is the addition of TrustZone - a foundation for security.
  • Ultra compact architecture and pipeline
  • Enhanced debug and trace capabilities (very critical for improving developers’ productivity)
  • Improved memory protection unit (that defines access permissions for software components, the new design enables better efficiency in programming and definition of memory regions)
  • Several performance-enhancing instructions

 

1. TrustZone for ARMv8-M: foundation for security

TrustZone for ARMv8-M brings hardware-enforced separation between the trusted and non-trusted software on each Cortex-M23 based device. As such, TrustZone provides a foundation for building embedded applications that, in the past, might have required two separate physical processors to create physical separation between the trusted and the non-trusted sides. A single Cortex-M23 processor can provide a robust solution for security requirements such as device identification management, high-value firmware protection, software certification, and secure boot, just to name a few.

 

The Cortex-M23 processor with TrustZone has two security states:

  • Secure state - can access both Secure and Non-Secure resources (memories, peripherals, etc)
  • Non-Secure state - can only access Non-Secure resources

 

Code execution transitions and data accesses in the two security states, is policed by hardware, minimizing switching overhead and guaranteeing determinism – a hallmark for all Cortex-M processors.  More details on features of TrustZone for ARMv8-M can be found in Joseph Yiu’s ARMv8-M architecture overview.

 

2. Compact two-stage pipelined processor

Cortex-M23 is a simple two-stage pipelined Von Neumann processor, yet it supports the full ARMv8-M baseline instruction set.  Users familiar with the Cortex-M0+ will quickly recognize many similar features in the Cortex-M23 that bring extreme energy efficiency to these processors: WFI (Wait for Interrupts)/WFE (Wait for Event) and sleep/deep-sleep modes, sleep-on-exit, SysTick timer and optional single cycle IO.

 

The instruction set comprises around 80 Thumb instructions, most of which are 16-bit wide to maximize code compactness, but also include a few 32-bit instructions where efficiency gains can be made. All ARMv6-M instructions are supported to ensure ease of code migration from the Cortex-M0 and Cortex-M0+ processors. Several new instructions have been included in the ARMv8-M baseline instruction set to improve performance efficiency for conditional operations, mutually exclusive accesses, hardware divide operations, and immediate moves.

 

3. Enhanced debug and trace

An efficient and secure 32-bit processor alone does not make for successful field deployment.  Software development costs often far surpass fabrication and hardware IP costs.  The Cortex-M23 makes it easier to develop and debug software by introducing more configurable hardware breakpoints and data watch points compared to other ARMv6-M processors.  An optional Embedded Trace Macrocell (ETM) has also been added, in addition to the optional Micro Trace Buffer (MTB) which is also available as in Cortex-M0+ processor. These options give designers the choice of a more full-featured instruction trace functionality or a more cost-effective, trimmed-down, instruction trace capability.

 

4. Memory Protection Unit for task isolation

A new programmer-friendly Memory Protection Unit (MPU) based on the latest PMSAv8 architecture has been added to the Cortex-M23 processor as an option.  It can “protect” up to 16 regions for each of the Secure and Non-Secure states. Each region has a base address, ending address, access permission and memory attribute settings. In multi-tasking environments, the OS can reprogram the MPU during task context switching to define the memory permissions for each task. For example, application tasks may be granted access to all or some application data and specific peripherals. The MPU dramatically improves system reliability by protecting all other data from corruption and other peripherals from unauthorized accesses.

MPU.png

Easier to set up memory regions

 

Cortex-M23’s memory protection architecture adopts base and limit-style comparator for defining memory regions, as opposed to the previous power-of-two size, sized-aligned scheme. This improvement simplifies software development, and in some cases, reduces memory wastage when region sizes do not fit a perfect power-of-two size.

 

5. New ARMv8-M baseline instructions

New instructions have been added to enhance the Cortex-M23's capability compared to ARMv6-M implementations, but without compromising the ultra-high energy efficiency of this Cortex-M processor class. Most of these 'new' instructions (except for the security extension ones) are inherited from the ARMv7-M architecture instruction set in order to extend Cortex-M23's capability compared to the Cortex-M0+.

 

5.1 Security extension

TrustZone for ARMv8-M brings additional instructions to the baseline instruction set. This includes the secure gateway (SG), non-secure branch (BXNS, BLXNS), and test target (TT) instructions. More information can be obtained from Joseph Yiu’s ARMv8-M architecture overview.

 

5.2 Execute-only code generation

Support for the execute-only memory regions has been improved by the addition of immediate-move instructions (MOV/MOVT inherited from ARMv7-M), which facilitate immediate-data generation in execute-only code. These instructions provide the ability to produce 32-bit values via two instructions without the need to perform a literal load.

 

5.3 Code optimization

Conditional Compare and Branch instructions (CBNZ/CBZ inherited from ARMv7-M) improve performance for many conditional control code sequences.  Immediate branch with long offset (B.W inherited from ARMv7-M) allows for direct branch to a far target address.  And hardware integer divide instructions (SDIV/UDIV inherited from ARMv7-M) reduce processing cycles for divide operations.

 

5.4 Exclusive access

Load and store exclusive instructions from ARMv7-M have been added to improve the architecture consistency of Cortex-M23 processor in multicore systems where semaphores between processors can be handled with the same mechanism. In addition, to provide atomic support for C11/C++11, the load-acquire and store-release instructions are included from ARMv8-A (Thumb 32 version), including exclusive access variant of those instructions.

 

For more information about the Cortex-M23 and the Cortex-M33, see the introduction white paper on both processors here.

ufo

NE10 fft_float32 result wrong

Posted by ufo 20-Oct-2016

When i use an ARM cortex-a9 CPU with NEON to test NE10 library,I got a wrong fft result.

My CPU is NXP I.MX6Q run in 1GHZ,my program is comiled with gcc-4.6.2.

NE10 lib is compiled with arm-linux-gnueabihf-gcc 4.9,out libraryfile is shared and static libary.

My test code :

#include <stdio.h>

#include <stdlib.h>

#include <math.h>

#include <string.h>

 

#include "NE10_dsp.h"

#include "NE10_macros.h"

#include "seatest.h"

#include "unit_test_common.h"

#include "ne_alloc.h"

/* ----------------------------------------------------------------------

** Global defines

** ------------------------------------------------------------------- */

#define TEST_FREQ (50)

#define TEST_LENGTH_SAMPLES (1024)

 

/* ----------------------------------------------------------------------

** Test input data for F32

** Generated by the MATLAB rand() function

** ------------------------------------------------------------------- */

 

static ne10_float32_t testInput_f32[TEST_LENGTH_SAMPLES * 2];

static ne10_float32_t out_amp_f32[TEST_LENGTH_SAMPLES * 2];

 

static ne10_float32_t y_out[TEST_LENGTH_SAMPLES];

/* ----------------------------------------------------------------------

** Defines each of the tests performed

** ------------------------------------------------------------------- */

 

//input and output

static ne10_float32_t * in_c = NULL;

static ne10_float32_t * in_neon = NULL;

 

static ne10_float32_t * out_c = NULL;

static ne10_float32_t * out_neon = NULL;

 

static ne10_fft_cfg_float32_t cfg_c;

static ne10_fft_cfg_float32_t cfg_neon;

 

void genarate_signal(float *complex_float_list,int freq,int total_num)

{

       int ii;

       for(ii = 0;ii<total_num;ii++)

       {

            complex_float_list[2*ii] = 100*(float)cos(2*ii*PI*freq/total_num);

            complex_float_list[2*ii+1] = 0;

       }

}

 

void test_fft_c2c_1d_float32_performance()

{

    ne10_int32_t fftSize = TEST_LENGTH_SAMPLES;

    ne10_int32_t flag_result = NE10_OK;

    fprintf (stdout, "----------%30s start\n", __FUNCTION__);

  /* FFT test */

  memcpy (in_c, testInput_f32, 2 * fftSize * sizeof (ne10_float32_t));

  memcpy (in_neon, testInput_f32, 2 * fftSize * sizeof (ne10_float32_t));

    cfg_c = ne10_fft_alloc_c2c_float32_c (fftSize);

    if (cfg_c == NULL)

    {

        fprintf (stdout, "======ERROR, FFT alloc fails\n");

    }

    cfg_neon = ne10_fft_alloc_c2c_float32_neon (fftSize);

    if (cfg_neon == NULL)

    {

        NE10_FREE (cfg_c);

        fprintf (stdout, "======ERROR, FFT alloc fails\n");

    }

  ne10_fft_c2c_1d_float32_neon ( (ne10_fft_cpx_float32_t*) out_neon, (ne10_fft_cpx_float32_t*) in_neon, cfg_neon, 0);

  ne10_vmul_vec2f_neon(out_amp_f32, (ne10_vec2f_t *)out_neon, (ne10_vec2f_t *)out_neon, fftSize);

  NE10_FREE (cfg_c);

  NE10_FREE (cfg_neon);

}

 

static void my_test_setup (void)

{

    ne10_log_buffer_ptr = ne10_log_buffer;

    /* init input memory */

    in_c  = (ne10_float32_t*) NE10_MALLOC ( (TEST_LENGTH_SAMPLES * 2 ) * sizeof (ne10_float32_t));

    in_neon = (ne10_float32_t*) NE10_MALLOC ( (TEST_LENGTH_SAMPLES ) * sizeof (ne10_float32_t));

    /* init dst memory */

    out_c = (ne10_float32_t*) NE10_MALLOC ( (TEST_LENGTH_SAMPLES * 2) * sizeof (ne10_float32_t));

    out_neon = (ne10_float32_t*) NE10_MALLOC ( (TEST_LENGTH_SAMPLES * 2 ) * sizeof (ne10_float32_t));

    genarate_signal(testInput_f32,TEST_FREQ,TEST_LENGTH_SAMPLES);

}

 

 

void Test_float_1024()

{

  uint32_t index = 0;

  uint32_t i = 0;

  float *p =out_amp_f32;

  my_test_setup();

   test_fft_c2c_1d_float32_performance();

     /* calculate peak value*/

  for(i=0;i<TEST_LENGTH_SAMPLES;i++){

       y_out[i] = sqrtf( out_amp_f32[2*i]+out_amp_f32[2*i+1] ) *2/TEST_LENGTH_SAMPLES;

  }

  /****/

  p =y_out;

  for(i=0;i<TEST_LENGTH_SAMPLES/16;i++){

  fprintf (stdout, "%4d--%f   %f   %f   %f   %f   %f   %f   %f\n",\

       i,

       *(p+0),

       *(p+1),

       *(p+2),

       *(p+3),

       *(p+4),

       *(p+5),

       *(p+6),

       *(p+7)     );

  p+=8;

  }

  index = search_MaxIdx(y_out,TEST_LENGTH_SAMPLES);

  fprintf (stdout, "max point num is %d  = %f\n",index,y_out[index]);

}

 

 

int main (ne10_int32_t argc, char** argv)

{

  ne10_result_t stat;

  ne10_result_t math_stat;

  ne10_result_t dsp_stat;

  stat = ne10_init();

  if(stat == NE10_OK)

  printf("ne10_init OK!\n");

  math_stat = ne10_init_math (stat);

  if(stat == NE10_OK)

  printf("ne10_init_math OK!\n");

  dsp_stat = ne10_init_dsp (stat);

  if(stat == NE10_OK)

  printf("ne10_init_dsp OK!\n");

 

  stat = ne10_HasNEON();

  if(stat == NE10_OK)

  printf("cpu with neon!\n");

 

  Test_float_1024();

    return 0;

}

 

The test result is put out from teminal:

[root@EmbedSky /mnt]# ./A9_test

ne10_init OK!

ne10_init_math OK!

ne10_init_dsp OK!

cpu with neon!

----------test_fft_c2c_1d_float32_performance start

   0--0.000000   0.000003   2.958737   0.000017         0.000001   0.000005   12.166712   0.000026

   1--0.000002   0.000002   3.526581   0.000015         0.000004   0.000012   17.811594   0.000039

   2--0.000001   0.000006   13.803596   0.000021                0.000002   0.000009   22.367292   0.000042

   3--0.000003   0.000004   6.788651   0.000026         0.000005   0.000028   70.067863   0.000124

   4--0.000003   0.000008   33.308075   0.000018                0.000006   0.000013   38.638416   0.000077

   5--0.000013   0.000014   11.727892   0.000059                0.000022   0.000083   72.452492   0.000262

   6--0.000005   0.000029   67.650848   0.000221                0.000011   0.000082   212.321381   0.000394

   7--0.000030   0.000015   74.159584   0.000180                0.000054   0.000121   357.053802   0.000500

   8--0.000001   0.000003   15.615634   0.000014                0.000002   0.000001   4.653375   0.000025

   9--0.000005   0.000007   7.290600   0.000033         0.000014   0.000050   33.152935   0.000153

  10--0.000003   0.000016   31.989527   0.000065                0.000004   0.000024   68.505203   0.000097

  11--0.000010   0.000015   52.914970   0.000014                0.000018   0.000069   268.941742   0.000185

  12--0.000009   0.000036   55.972534   0.000130                0.000009   0.000040   101.129539   0.000211

  13--0.000016   0.000015   40.044838   0.000113                0.000028   0.000085   240.823975   0.000331

  14--0.000017   0.000058   260.015289   0.000385               0.000046   0.000208   580.652283   0.000964

  15--0.000101   0.000039   234.310699   0.000321               0.000168   0.000232   845.263184   0.000540

  16--0.000002   0.000009   43.014130   0.000061                0.000005   0.000025   65.147423   0.000105

  17--0.000008   0.000002   25.115501   0.000043                0.000015   0.000030   81.902115   0.000142

  18--0.000003   0.000020   48.622169   0.000054                0.000008   0.000025   74.208450   0.000105

  19--0.000019   0.000021   48.614277   0.000050                0.000033   0.000123   247.859283   0.000346

  20--0.000007   0.000040   15.095722   0.000139                0.000001   0.000032   86.764946   0.000143

  21--0.000006   0.000018   68.452156   0.000052                0.000015   0.000070   336.700592   0.000229

  22--0.000012   0.000054   202.304031   0.000192               0.000042   0.000135   390.701294   0.000626

  23--0.000085   0.000059   164.578522   0.000215               0.000136   0.000302   543.785278   0.000656

  24--0.000003   0.000012   81.045288   0.000058                0.000014   0.000039   121.416832   0.000143

  25--0.000034   0.000044   105.488533   0.000096               0.000064   0.000259   550.350098   0.000758

  26--0.000014   0.000036   14.541925   0.000290                0.000015   0.000054   116.387253   0.000232

  27--0.000055   0.000138   299.356995   0.000409               0.000109   0.000758   1779.936035   0.002434

  28--0.000026   0.000074   258.019379   0.000446               0.000049   0.000043   49.417068   0.000408

  29--0.000076   0.000143   239.313828   0.000484               0.000109   0.000691   1362.538452   0.002145

  30--0.000041   0.000058   516.449036   0.000623               0.000059   0.000216   619.675476   0.000774

  31--0.000088   0.000172   520.821411   0.000269               0.000125   0.000645   2337.425049   0.001845

  32--0.000000   0.000003   3.589420   0.000019         0.000000   0.000006   13.843405   0.000028

  33--0.000001   0.000003   10.298284   0.000018                0.000002   0.000019   66.461250   0.000077

  34--0.000000   0.000006   19.241838   0.000023                0.000004   0.000008   21.834381   0.000055

  35--0.000010   0.000015   20.904856   0.000062                0.000017   0.000093   146.907822   0.000297

  36--0.000001   0.000011   40.437374   0.000021                0.000007   0.000013   43.941502   0.000064

  37--0.000015   0.000018   29.845486   0.000058                0.000025   0.000095   120.085327   0.000260

  38--0.000003   0.000036   37.524120   0.000216                0.000013   0.000072   181.527771   0.000368

  39--0.000032   0.000024   39.786701   0.000189                0.000055   0.000155   215.084381   0.000531

  40--0.000001   0.000004   10.670328   0.000016                0.000001   0.000000   5.063481   0.000018

  41--0.000004   0.000006   8.243680   0.000024         0.000011   0.000038   35.816643   0.000103

  42--0.000003   0.000013   24.408234   0.000055                0.000003   0.000020   52.543468   0.000083

  43--0.000006   0.000008   35.756958   0.000019                0.000011   0.000034   181.120407   0.000102

  44--0.000007   0.000025   48.517586   0.000088                0.000008   0.000031   79.276314   0.000164

  45--0.000014   0.000012   23.394804   0.000085                0.000024   0.000065   140.189728   0.000241

  46--0.000012   0.000047   186.442764   0.000310               0.000033   0.000159   435.293610   0.000740

  47--0.000074   0.000024   165.113083   0.000267               0.000125   0.000183   584.355713   0.000478

  48--0.000001   0.000008   28.567337   0.000046                0.000003   0.000018   47.600296   0.000078

  49--0.000007   0.000000   16.499947   0.000032                0.000012   0.000025   52.054745   0.000093

  50--0.000002   0.000015   34.439335   0.000044                0.000006   0.000019   53.052429   0.000079

  51--0.000012   0.000011   28.535830   0.000026                0.000022   0.000066   130.810287   0.000164

  52--0.000005   0.000026   18.895275   0.000086                0.000001   0.000023   61.850578   0.000112

  53--0.000004   0.000007   38.872627   0.000039                0.000009   0.000021   183.325409   0.000079

  54--0.000008   0.000037   131.532867   0.000158               0.000027   0.000096   275.315094   0.000453

  55--0.000056   0.000033   104.326767   0.000165               0.000089   0.000184   326.340057   0.000415

  56--0.000002   0.000010   50.334133   0.000046                0.000008   0.000028   81.778809   0.000112

  57--0.000020   0.000020   54.890594   0.000048                0.000037   0.000126   259.453979   0.000344

  58--0.000007   0.000025   24.627508   0.000158                0.000004   0.000034   79.085579   0.000139

  59--0.000021   0.000059   140.300293   0.000177               0.000043   0.000328   820.769775   0.001047

  60--0.000014   0.000037   122.276161   0.000216               0.000023   0.000031   54.020184   0.000236

  61--0.000034   0.000062   100.214119   0.000234               0.000046   0.000304   602.250610   0.000944

  62--0.000021   0.000007   280.720703   0.000308               0.000033   0.000144   411.743835   0.000576

  63--0.000058   0.000071   271.125824   0.000097               0.000090   0.000225   1149.802368   0.000572

max point num is 254  = 2337.425049

 

BY use the function genarate_signal(testInput_f32,TEST_FREQ,TEST_LENGTH_SAMPLES),the correct result is 20HZ and peak is 100.

Can anyone find how could i get the correct fft float32 result? Thank you very much!

Prototyping an ARMv8-based design is similar to prototyping any other design.  FPGA prototyping for these types of application is generally used to validate the hardware quickly to head into the software development stage sooner and accelerate the software development. 

 

Whether you need scalability for your current design as you move through the design and verification process or whether you need your FPGA platform to be reusable and able to scale for future designs that may be larger than your current one, it all starts with identifying and selecting the ideal building blocks. The foundational prototyping board you choose must have flexibility to expand so a custom platform is usually out of the question as a custom board requires even greater customization to grow. When crafting your platform, there are three initial FPGA building blocks to evaluate: Single FPGA boards, Dual FPGA boards, and Quad FPGA boards.

 

Selecting either a single, dual, or quad board depends on your design’s size, memory requirements, and the number of inter-FPGA connections and external I/Os that will best fit your needs. The chart below provides an example of the differences in these board types based on S2C’s solutions for its Virtex UltraScale Logic Modules.

 

These comparisons don’t tell the whole story though. You must take a closer look at the architecture for each of these solutions.  Besides the number of physical interconnections between FPGAs, the type (e.g. DDR3, DDR4) and capacity (e.g. 4GB, 8GB) of on-board memory is equally important to your design. Of additional interest should be the number of high-speed gigabit transceivers and their performance level. The diagrams below provide in-depth comparisons of each of the architectures for single, dual, and quad FPGA prototyping boards.

 

Page 8(1-1)-Diagram of a single FPGA module architecture.jpg

 

Diagram of a single FPGA module architecture

 

Page 8(2)-Diagram of a dual FPGA module architecture.jpg

 

Diagram of a dual FPGA module architecture

 

 

 

Page 9 - Diagram of a quad FPGA module architecture.jpg

 

Diagram of a quad FPGA module architecture

 

 

 

The type of I/O connectors used in the FPGA module may have a big impact on your design mapping and performance. First, they must be optimized for FPGA I/O banks, and even the FPGA die, in case some FPGAs have multiple internal die. In addition, having I/Os from different die will decrease performance. All traces from the FPGA to the same I/O connector should have the same trace length to increase bus performance. Connector performance itself may also play an important role especially if the connectors are optimized for running high performance LVDS (e.g. over 1GHz).

 

It's All About Flexibility

The foundational prototyping board is the first step in building scalability. Each solution whether a single, dual, or quad system must allow you to grow, you must be able to have the flexibility to grow your single system into a dual, quad or beyond.  Likewise your dual system should allow you to stitch together other systems of the same FPGA type and architecture to create a quad system. 

 

Even with this flexibility, there are some implications to the number of interconnects and I/Os when stitching together these systems so careful consideration must be given to which system you initially choose. You will notice in the following diagrams that building these multi-FPGA systems require the ability for the boards to be connected via cables or interconnection modules.  These systems will also need some sort of external module to manage global clocking and reset mechanisms.

 

Page 10(1) (Updated)- Connection of two single FPGA prototyping module.jpg

 

Connection of two single FPGA prototyping modules

 

 

Page 10(2) (Updated)- Connection of 4 single FPGA prototyping modules.jpg

 

Connection of 4 single FPGA prototyping modules

 

 

 

Going Beyond 4 FPGAs

What happens if your design needs require going beyond the use of either 4 single FPGAs, 2 dual FPGAs, or a quad FPGA system?  This increase in complexity triggers a whole new set of scalability questions.  These questions can be broken down into several categories.

 

Space

How big of a desk or lab area do you need to work with a large number of FPGAs?

Although you can continue to stitch together multiple prototyping boards to expand beyond a quad system, your physical lab space may be limited making the connections of these boards much more complicated. Not only will you be dealing with space issues, but also the cabling of these systems will become very unwieldy.

 

Scalability & Flexibility

What if you require more logic and memory capacity or the system interfaces or memory types change?

Can you configure the large number of FPGA resources for multiple designs?

Because of the investment into large multiple board systems, these reusability type questions become important. It is much easier to invest in single board systems if the expectation is that the board will have limited use beyond the initial design, but when the initial design require the use of a larger prototyping system, your investment must consider possible changes in the prototyping environments and future project uses.

 

Global System Control

How do you provide low-skew clocks and resets to a large number of FPGAs that you are using for the same design?

Is there a way to easily download to FPGAs remotely and how fast is it?

Lower-end software can provide some sort of support for these questions but may miss some basic requirements. Furthermore, the larger the overall hardware system, the more difficult it is to control such things as clocks and resets. Downloading for larger systems can be a cabling nightmare. Higher-end systems that offer complete runtime support and chassis with minimal cabling help reduce the pain dramatically.

 

Power Supply

How do you provide power to a large number of FPGAs?

Can each FPGA be individually controlled (On/Off/Recycle)? Is there a power-monitoring feature that you can employ?

Providing power individually to each board can impose even more physical lab space issues not to mention complicating the management of powering each board.

 

Reliability?

How do you verify that all your clocks and interconnections are correct?

Is there an easy way to monitor the system as well as the individual FPGA statuses?

Making sure a complex prototyping system as large as 32 FPGA works correctly is extremely difficult without automation. If a design isn’t running correctly, a great deal of time can be wasted trying to manually determine if the error is due to the design itself or the FPGA system. Software that provides automated self-test capabilities as well as automated voltage, current, and temperature monitoring with shut down will provide much needed peace of mind.

 

Seeing FPGA Prototyping for Juno In Action

S2C will be demonstrating their latest Prodigy Juno ARM Interface Module for FPGA prototyping at the upcoming ARM TechCon 2016 so that you can get a close up view of how FPGA prototyping is done for a Juno-based design. 

 

S2C provides a complete easy set up reference design as part of the Prodigy Juno ARM Interface Module package. It connects S2C Prodigy Virtex UltraScale and Kintex UltraScale Logic Modules with the Juno ARM Development Platform. The reference design shows:

 

  1. 1) Comprehensive self-testing between the two environments
  2. 2) Expanded FPGA capacity
  3. 3) Early porting of OS kernel or driver code for ARMv8-A processors
  4. 4) High-speed DDR4 memory access between the Logic Module(s) and Juno ARM Development Platform

ARMTechCon 2016 is nearly upon us, and there’s so much high quality technical content in the conference programme and exhibition floor that the only difficulty you’ll have is choosing what to attend at the Santa Clara convention centre! Registrations are still open, and if you need any more convincing then check out this useful guide on how to convince your boss to send you!

 

For those of you interested in the automotive industry and how ARM-based technology fits in, I’ve pulled together a list of what I think are the conference highlights in this space. Let me know what you’re most looking forward to in the comments section below!

 

 

ADAS.png

 

 

Demos

 

At the ARM booth, #402, you will find a large demo that will show some of the technology and applications we can expect to see in the coming 5-10 years, ARM's idea of how we might be spending our time on the road. Built by ARM's specialist demo team, it addresses the driver experience and what that means for in-vehicle infotainment and driver safety. Come on down and talk to one of the staff nearby, and they will be able to show you applications such as:

 

  • Autonomous driving mode on the dashboard: Steering wheel retracts and the full dashboard area can be used for apps such as Office, watching movies, music playback
  • Proximity warning: The dash display glows red to indicate the presence of people near the front of the unit
  • Sign recognition: Cameras at the front see and perceive a sign, displaying it on the dashboard UI

 

 

Green Hills Software solution demonstrations at ARM TechCon in booth 313, October 26-27, will highlight several of the company’s products and services across several embedded industries using 32-bit and 64-bit platforms based on ARM Cortex-A, Cortex-R and Cortex M. One of the demonstrations Safe & Secure eCockpit Consolidation shows Green Hills’ unique run-time separation architecture that safely and securely executes guest operating systems such as Linux and Android on the same processor as ASIL-certified safety-critical tasks; running concurrently on the same core or across multiple cores, while securely sharing resources such as the GPU.

 

 

 

Technical presentations

 

Tuesday

 

Chris Turner (ARM) Developing safe and secure SoCs for automotive, robotics and healthcare Tuesday Oct 25th 10.30am – 11.20am Ballroom E

Cars, robots, medical and other devices rely on ARM technology for continuous safe operation according to guidance given by standards such as IEC 61508 and ISO 26262. Security is equally important for these applications.

 

This presentation describes how ARM approaches development of processors for such safety-related applications and the hardware and software features for fault detection and control that may be employed by device designers and application engineers. christurner will discuss various processors as a heterogeneous multi-processing system is often required to meet all the performance, efficiency and functional safety requirements for applications such as highly-automated driving.

 

James Scobie (ARM) Addressing the Challenges of Complex Control in Functionally Safe Applications Tuesday Oct 25th 11.30am – 12.20pm Ballroom H

The ARMv8-R architecture is designed to improve safety, security and reliability in Embedded control systems. This presentation describes the features and configurations offered by ARM Cortex-R processors that enable designers to deliver the ultimate in functional-safety capabilities for automotive and industrial applications.

 

A microarchitecture is discussed that provides high performance combined with the deterministic execution and responsiveness required for hard real-time applications ranging from industrial controllers and powertrain system through to safety islands and sensor fusion in vision systems. Bare metal virtualization isolates safety and security events, making for lower cost and improved robustness in complex software deployments. Find out more in jscobie's blog New ARM Cortex-R52 enables autonomous systems with the highest functional safety standards

 

 

Bernhard Rill (OpenSynergy) The upcoming ARMv8-R architecture perfectly matches the current automotive trends Tuesday Oct 25th 2.30pm – 3.20pm, Ballroom E

The increasing number of functions in vehicles challenges the automotive industry to find solutions that allow the merging of several software systems on one ECU. OpenSynergy (bernhardrill) has answered this with a software architecture based on virtualization technology, ARM architecture and AUTOSAR.

This innovative approach provides:

  • software update,
  • supplementary features,
  • security,
  • mixed ASIL components,
  • reduced BOM costs

 

The ARMv8-R based architecture even provides hardware support for the real-time multi-AUTOSAR software architectures. It is therefore perfectly positioned to serve the current automotive trend to add functionality and enable overall integration.

 

Wednesday

 

Jon Taylor (ARM), Felix Baum (Mentor Graphics Corporation) Hard Real-time Virtualization: how hard can it be? Wednesday Oct 26th 8.30am – 9.20am Ballroom E

The ARMv8-R architecture offers effective virtualization while maintaining the hard real-time response needed to control applications in the industrial, automotive, medical, and military markets. Virtualization enables safety, security, and reliability and it can be the key to successful, cost-effective development and deployment of complex software applications. This session brings together engineers from ARM (jont)and Mentor Graphics to describe how these processors can be applied in next-generation, highly-assisted automotive driving systems. These safety-related applications are kept free from interference by the underlying isolation present in the new ARMv8-R processor architecture.

 

 

Rob Bates (Mentor Embedded Device Software Where Safety Meets Security Wednesday Oct 26th 10.30am – 11.20am Ballroom F

Safety has been codified in several industry standards such as ISO 26262 for automotive and IEC 61508 for industrial where software has become a vital part of both the device and ensuring its safety. Security has now become critically important for device manufacturers and their suppliers, including those that supply COTS software.

 

Existing standards define the lifecycle leading to the creation of safety critical software, but do not say anything directly about security. Cybersecurity, however, is now an important consideration for manufacturers, governmental agencies, and the public at large. Fortunately, there is significant overlap between safety and security software development, and the practices underlying safe software development can be extended to security.

 

This session discusses the overlap between the two practices, and what to consider when fulfilling governmental and industry recommendations for cybersecurity over and above what is required for safety.

 

 

Jay Abraham (MathWorks) Reference workflow for meeting functional safety requirements in automotive systems Wednesday Oct 26th 11.30am – 12.20pm Ballroom H

The functionality and robustness of software is essential for automotive electronics that control powertrain, braking, steering, and driver assistance systems. The development of these systems utilize Model-Based Design and require compliance with ISO 26262 (standard for vehicle functional safety). Model-Based Design enables continuous verification of requirements, software design, and code.

 

This technical session will explain reference workflows for automotive applications to meet functional safety standards. We will explore various verification activities such as back-to-back equivalence testing to confirm that code compiled for target ARM processors match the software design from a numerical perspective while satisfying execution performance requirements.

 

 

Jay Thomas (LDRA) Save Time and Money with ISO 26262 Compliance for Automotive Software Wednesday Oct 26th 2.30pm – 3.20pm Ballroom E

Learn how to demonstrate compliance to the ISO 26262 functional safety standard to provide confidence to OEMs and suppliers. We will also show how the use of standards can help lower costs and development time by identifying and addressing defects during development rather than trying to correct them after deployment. Key to this approach is the use of automated test capabilities for comprehensive software quality assurance.

 

The presentation will also address the increasing demands for security in automotive software, using automated processes to develop and test high-quality code that identifies potential security vulnerabilities to be addressed early in the development process. The methodology provides a compliance roadmap to help manage the software planning, development, verification, and regulatory activities of ISO 26262 Part 6, Product Development: Software Level (ISO 26262-6).

 

 

Greg Davis (Green Hills Software) Designing Reliable Code using MISRA C/C++ Wednesday Oct 26th 3.30pm – 4.20pm Ballroom H

C and C++ are powerful, yet compact programming languages, but they permit programming practices that are not well suited for high reliability systems. MISRA C/C++ is a collection of rules that define a subset of the languages that is less error-prone and more suitable for critical systems, such as in avionics, medical systems, and defense.

 

This session will provide an introduction to MISRA C/C++, when it should be used, and when it should not. It will also provide an introduction to the most important rules of MISRA and how they help ensure a reliable system.

 

Thursday

 

Shaun Purvis (Hardent Keeping your software simple on today's complex SoCs Thursday Oct 27th 2.30pm - 5.30pm Great America J

For today's system-on-chips (SoCs), having a single, multi-core, high performance embedded processor isn't enough. We now see SoCs combining multiple types of processors, such as a ARM Cortex-A/R combination. These heterogeneous SoCs provide robust computing power, but the increased hardware complexity also complicates software. SoCs built with the latest ARM technology, however, provide additional features that help abstract these complexities from software.

This session will discuss some of these features in detail, and how to take advantage of them to simplify software.

 

The ARMv8 architecture introduced 64-bit capability and is already in use in state-of-the-art electronic devices, including smartphones and tablets, as well as in network infrastructure and servers. To help foster the growth of new ideas around ARMv8 within academia, the DS-5 Development Studio Community Edition suite of tools has been enhanced with an ARMv8 FastModel. The enhancement provides a look inside ARMv8, particularly with regard to register operations, memory map, interrupt handling and programming model.

 

In these videos, we demonstrate how to run a software application on an ARMv8 model using the DS-5 tool. In particular, you will learn how to create a project based on ARMv8 FastModel and run a pre-built application on that model. During our demonstration, you will experience an inside view of how the ARMv8 architecture operates at low-level.

 

Part 1 - Introduction to ARMv8 Architecture and DS-5             Part 2 - Install and Setup DS-5

 

                                                                                   

 

 

Part 3 - Create a Project for ARMv8 Model                                Part 4 - Run an Application on ARMv8 Model

 

                                                                                     

 

 

 

The ARMv8 Architecture

 

The ARM architecture forms the basis for every ARM processor that goes into the digital devices around us, whether they are smartphones, sensors, wearables or servers. Over time, the ARM architecture has evolved to meet growing demands for extra functionality, integrated security features, high performance and the needs of new and emerging markets. The ARMv8 is the latest version of the ARM architecture and is the largest architectural change in ARM's history.

 

The ARM DS-5 Development Studio

 

The ARM DS-5 Development Studio is a suite of tools for embedded C/C++ software development on any ARM based SoC, featuring editor, compilers, debugger and system profilers. The ARM DS-5 gives you a core set of tools to make sure the most critical software on your system works efficiently and reliably.

 

The ARM University Program enables worldwide educational use of ARM technology, benefiting university courses and labs, student projects and academic research by supporting academics (educators) with our flagship Education Kits.

 

Filter Blog

By date:
By tag:

More Like This