1 2 3 Previous Next

ARM Processors

207 posts

Os processadores e microcontroladores construídos com a arquitetura ARM são identificados conforme a versão da arquitetura adotada, o perfil e suas variantes.


Até o momento já foram definidas 7 versões de arquitetura ARM, sendo atualmente em uso apenas 4, identificadas pelo Prefixo ARMv, sendo elas ARMv4, ARMv5, ARMv6 e ARMv7.

Considerando a mais atual a ARMv7, temos 3 perfis de uso definidos, ARMv7-A, ARMv7-R e ARMv7-M sendo respectivamente usadas para, processadores de aplicação geral, processadores e microcontroladores para aplicações de uso critico e resposta em tempo real, e finalmente o perfil para uso em microcontroladores de uso geral.

 

As variantes são identificadas por letras adicionados as versões no momento existem as seguintes:

  • ARMv4,
    uma variante que inclui apenas o conjunto padrão de instruções ARM.
  • ARMv4T,
    nessa variante é adicionado o conjunto de instruções Thumb.
  • ARMv5T 

    melhorias em relação a interworking e instruções ARM. adicionado "Count Leading Zeros" (CLZ) e instruções para "Software Breakpoint"(BKPT).

  • ARMv5TE

    Melhorias no suporte aritmético relativo a algoritmos de processamento de sinal (DSP) , adicionado "Preload Data" (PLD), "Load Register Dual" (LDRD), Store Register Dual (STRD), e adicionado instruções para transferencias de 64-bits para registradores de coprocessador (MCRR, MRRC).

  • ARMv5TEJ,
    Adicionado a instrução BXJ e outros suportes para extensão arquitetural Jazelle®.
  • ARMv6,
    Adicionado novas instruções para o conjunto padrão ARM, formalizado e revisado o modelo de memória, e a arquitetura de Depuração.
  • ARMv6K,
    Adicionado instruções para suporte a multiprocessamento ao conjunto padrão de instruções e alguns recursos extras para o modelo de memória.
  • ARMv6T2,
    Introduz a tecnologia Thumb-2, que dá suporte a um maior desenvolvimento de instruções fornecendo um nível de funcionalidade similar ao conjunto de instruções padrão ARM.

Há também as extensões que são opcionais que podem ser adicionadas conforme o fabricante, as extensões são dividas em grupos, algumas delas estão listadas abaixo:

  • Extensões relativas ao conjunto de Instruções
    • Jazelle, é uma extensão que dá poder a variante arquitetural ARMv5TE como ARMv5TEJ.
    • Extensão para Virtualização.
    • ThumbEE é uma extensão que fornece um conjunto de instruções ampliado do conjunto Thumb padrão e que permite código dinamicamente gerado, sendo obrigatório no perfil ARMv7-A e é opcional no perfil ARMv7-R, para a versão arquitetural ARMv7.
    • Extensões de ponto flutuante é uma extensão para comprocessador de ponto flutuante. Esta extensão é historicamente chamada de Extensão VFP.
    • Advanced SIMD, é uma extensão do conjunto de instruções que adiciona instruções do tipo "Simgle Instruction Multiple Data" (SIND), para operação com vetores com os tipos de dados Inteiros e ponto flutuante de precisão simples, sobre registradores doubleword e quadword.
  • Extensões arquiteturais
    • Extensões de segurança.
    • Extensões para Multiprocessamento.
    • Extensões para Endereçamento Físico de Maior Largura.
    • Extensões para Virtualização.

Este resumo foi proposto para a Wikipedia por mim no link: Arquitetura ARM – Wikipédia, a enciclopédia livre

Para habilitar ou desabilitar uma interrupção em um cortex-m0, há dois registradores, este método é a melhor forma para evitar "race conditions" seja em um ambiente multitask ou não, além de reduzir o número de instruções assembly Para gerar uma interrupção via software é adotado também o mesmo procedimento.

 

Quando se usa multitask, em um microcontrolador, o que não é muito comum em microcontroladores de 8-bit, você precisa fazer uso de certos procedimentos para evitar problemas.

 

Em um ambiente multitarefas, duas ou mais tarefas ou mesmo quando apenas uma interrupção interfere no registrado além do processo principal, podem interferir um único registrador, interferindo em seus bits para habilitar ou desabilitar a interrupção, ou mesmo para simular uma interrupção externa via seu código. para evitar a ocorrência de "race conditions" ou seja a disputa pelo uso do registrador, usando poucos passos, os microcontroladores Cortex-M, usam dois registradores para o mesmo recursos, são dois para habilitar/desabilitar respectivamente e dois para colocar a interrupção em pending_mode, ou remover esta condição.

 

Veja, colocando uma interrupção em estado pendente (Pending Mode) é como provar o lançamento de tal interrupção, simulando a ocorrência externa em sua origem. Porém você pode também remover esta ocorrência, limpando esta estado antes que ele seja processado.

 

Há dois registradores para habilitar/desabilitar uma interrupção, e são chamados setena e clrena, respectivamente "Set Enable Interrupt" e "Clear Enable Interrupt", estes registradores são membros da coleção de registradores existentes no nvic (Nested Vectore Interrupt Controller), NVIC é um recurso externo ao núcleo do processador que gerencia as interrupções e exceções. Na figura abaixo, retirada do livro de Joseph Yiu, [1], é apresentado o mapeamento de memória onde se consegue acesso aos registradores do NVIC, permitindo assim sua parametrização. Tais registradores se encontram entre o endereço 0xE0000000 a 0xFFFFFFFF, tal faixa é chamada de Espaço de Controle do Sistema (System Control Space scs) que se resume a faixa 0xE000E000 até 0xE000EFFF, que por sua vez está dentro do Barramento Interno de Periféricos (Private Peripheral Bus ppb).

Captura de tela 2014-08-21 00.18.34.png

O pacote CMSIS oferece um amplo suporte através de funções e macros para gerir tais registradores, mas iremos focar na codificação em C e Assembly para compreendermos os benefícios arquiteturais nos dado pelo ARM

 

O registrador SETENA, comentado acima,  é acessado  através do endereço 0xE000E100, este endereço permite leitura e escrita, quando o processador inicializa após um reset seu valor é 0x00000000, cada bit é representação do estado de uma interrupção, o bit 0 é a interrupção de número 0 (#0) ou seja a exception de número 16 (#16), o bit 2 é a interrupção de número 2 (#2), ou seja a exception #18, e assim por diante.

 

O segundo registrador que faz par com este é usado para limpar os estados definidos por este é o registrador CLRENA e é acessado pelo endereço 0xE000E180.

 

Estes dois registradores portanto são usados para habilitar e desabilitar, havendo outros registrado, como citado para representar a ocorrência da interrupção externamente, e que podem ser usado para simular por software tal ocorrência, este dois registradores são setpend acessado pelo endereço 0xE000E200  que define haver uma interrupção pendente, e clrpend que acessado pelo endereço de mémoria 0xE000280. Iremos ver mais detalhes mais a frente.

 

Como já falamos o registrador SETENA tem como função habilitar a ocorrência de interrupções, para isso basta definir como 1 o bit correspondente a interrupção que se deseja habilitar, porém nada acontece quando se define o respectivo bit como zero, ou seja limpa o bit, já que este registrador apenas é para habilitar a interrupção e/ou saber se ela está habilitada.

 

para desabilitar uma interrupção é preciso usar o registrador que faz par com o SETENA, que tem nome de CLRPEND, uma fato interessante é que este registrador não é oposto ao SETENA, ele apenas tem função oposta, para se saber qual interrupção está desabilitada é necessário consultar no registrador SETENA o respectivo BIT se ele está 0. Para desativar uma interrupção basta escrever o respectivo bit com o valor 1, não tendo efeito algum escrever o valor 0 neste registrador.


Os outros dois registradores que tem função identificar uma interrupção pendente, identificados como SETPEND e CLRPEND tem função similar aos registradores SETENA e CLRPEND, mas sua função é informar que há interrupções pendentes para serem tratadas, portanto ao ler o registrador SETPEND você irá saber que há uma determinada interrupção para ser tratada conforme o bit que está ativo, a ordem dos bits é a mesma usada em SETENA, porém há a possibilidade como já dito de se simular que uma interrupção ocorreu, bastando escrever 1 no respectivo bit, logo que isso for feito a interrupção será lançada, e poderá ser tratada pelo respectivo handler/vetor. Porém suponha que esteja dentro de outra interrupção e que ao manipular algum periférico alguma interrupção pode ser lançada acidentalmente por este periférico e se deseja retirar o estado de pendência dela, basta portanto escrever 1 no bit correspondente desta interrupção no registrador CLRPEND, assim a pendência para esta interrupção deixa de existir, como nos pares anteriores escrever 0 em ambos registradores não tem efeito.

 

Concorrência pelos registradores (Race Condicion)

 

Sobre o problema de concorrência de registradores, muito comum em sistemas multitarefa, a Arquitetura ARM adota esta prática de dois registradores com funções inversas exatamente para evitar a necessidade de  leitura previa do registrador para depois efetuar a mudança de estado, assim não há problemas de concorrência e perda de estados.

 

Além deste problema de concorrência, onde dois processos podem intervir no mesmo registrador e um perder a alteração feita pelo outro, temos também o numero de passos necessários para efetuarmos tal mudança, já que com esta abordagem não precisamos consultar o estado atual do registrador para regrava-lo, basta mudar o bit desejado e não há perda do outro estado, já que a escrita do valor 0 é ignorada, ou seja não se muda o estado oposto usando o mesmo registrador.

 

Veja o código abaixo em C, ao se escrever o valor 0x4 (B00000100) no registrador SETENA.

*((volatile unsigned long *) (0xE000E100)) = 0x4; // Disable interrupt #2






 

Tal escrita apenas interfere nos bits que são setados com o valor 1, sendo ignorado os bits que são de valor 0, com esta estratégia evita-se a necessidade de leitura do registrador para se fazer a equiparação dos bits e definir o desejado. Veja abaixo como fica tal código em Assembly

 

LDR    R0, =0xE000E100    ; armazena o endereço do registrador SETENA em R0
MOVS   R1, #0x04          ; move o valor 0x4 para R1, equivalente em binário B00000100, 
                          ; bit 2, é a interrupt #2 (Exception #18)
STR    R1, [R0]           ; Escreve o conteúdo de r1 no endereço armazenado em r0.


Observe que somente três instruções são usadas em assembly para ativar uma determinada interrupção sem interferir no estado das demais.

Usamos os seguintes comandos: LDR, MOVS STR
e os Registradores R0 e R1

Como pode ver, não é preciso ler o registrador antes de altera-lo, uma vez que ele somente considera a escrita do valor 1, portanto ao escrever o valor 0 ele não considera, assim você não consegue eliminar acidentalmente alterações realizadas por outros processos.

 

Vejamos por questões didáticas a abordagem convencional. OU seja um registrador para habilitar/desabilitar uma interrupção, estamos usando aqui o mesmo endereço, mas isso não representa a realidade.

 

*((volatile unsigned long *) (0xE000E180)) = *((volatile unsigned long *) (0xE000E180)) | 0x4; // Desabilita a interrupt #2




 

Como pode ver, no procedimento tradicional, você primeiro lê o registrador, altera o valor obtido e grava novamente no mesmo registrador, porém o que aconteceria se um segundo processo toma-se a execução neste instante? e alterasse o registrador também? você teria portanto um valor inválido e perderia em seguida a alteração realizada pelo segundo. Veja o mesmo em assembly abaixo, observe como se gasta mais instruções e assim aumenta a possibilidade de concorrência.

 

MOVS     R2, #0X04        ; Mascara de bytes, somente o bit 2 é habilitado
LDR      R0, = 0XE000E100 ; registra o endereço de SETENA no registrador R0
LDR      R1, [R0]         ; Obtém o estado atual do registrador
ORRS     R1, R1, R2       ; altera o valor obtido com o novo valor do bit 2
STR      R1, [R0]         ; Devolve o valor para o registrador


Como pode ser observado você irá gastar duas instruções a mais para ativar uma interrupção, além disso entre a execução da instrução 03 (ORRS) e 05 (STR) é possível haver alteração no valor do registrador SETENA, sendo o valor armazenado em R1 inválido.

usamos neste exemplo as instruções MOVS, LDR, ORRS e STR e os registradores R1, R2 e R0

 

O mesma situação pode ocorrer com o par de  registradores SETPEND e CLRPEND, acarretando situações imprevisíveis e comportamentos indesejados, como perda de sincronismo entre sequências de interrupções.


Este post se refere a anotações que tenho feito relativo aos meus estudos da arquitetura Cortex-M em especial Cortex-M0, e poderão sofrer alterações e melhoras no decorrer de meus estudos.

 

Fontes:

[1] - The Definitive Guide to the ARM Cortex-M0, Joseph Yiu

 

Chinese Information  中文信息:参与ARM技术培训的新途径

Just a short update to highlight an exciting new development. In response to demand, ARM has launched a limited program of public open-enrollment training courses. We are hosting these at our major regional support centres in San Jose, Cambridge and Shanghai. The program, as I say, is limited at present but touches several of our most popular courses, including Cortex-M System Design, TrustZone and ARMv8 Software Development.

 

You can check out the full schedule here: ARM Training Courses - ARM

 

If you have any questions, please don't hesitate to contact the ARM training team: Contact Support - ARM

 

Chris

A good paper about Cortex M from AnandTech, you can read it by the link AnandTech | ARM's Cortex M: Even Smaller and Lower Power CPU Cores

A study recently carried out by Cambridge University found that the global cost of software debugging has risen to the princely sum of $312 billion every year, and that developers spend an average of 50% of their programming time finding and fixing bugs (read the full story here). Divide that massive sum by 7.1 billion people on the planet and it works out at $44 per person. Put another way, it’s enough to buy everyone in the world a Raspberry Pi!


Furthermore, the trend for increasing complexity in SoC design (see graph below) means that this problem will only take up more resources in terms of time and money going forward. It is an issue that has given SoC architects and system developers’ headaches for years.

ITRS 2007 SoC Consumer Portable Design Complexity Trends


With that said, a well-thought out debug and trace solution for your SoC can help manage the increased complexity by providing the right hardware visibility and hooks. Software developers can make use of this key functionality to develop optimized software in a timely manner with reduced risk of bugs. Each of the following 4 key use-cases (see picture below) can be addressed for your SoC design with a customized debug and trace solution that allows for:

  • Faster SoC bring-up
  • Easy and quick software debug
  • In-field analysis and postmortem debug
  • System optimization via profiling, event synchronization

 

CoreSight-IP-Diagram.png

ARM CoreSight SoC product is designed to offer a comprehensive solution that can be tailored to meet specific requirements. The CoreSight SoC-400 allows you to:

  • Design for large systems with multiple cores through use of configurable components
  • Maximize debug visibility using a combination of debug components
  • Use IPXACT descriptors for all components to automate stitching and for testbench generation
  • Support different trace bandwidth requirements for complex SoCs
  • Accelerate design and verification through example subsystems, testbenches, test cases and necessary verification IP components
  • Support multiple hardware-debug models for multiple use cases

 

When all of this is put together in a wider context, ARM CoreSight IP gives design teams a real advantage through its innovative debug logic that reduces design development and software debug cycles significantly. Furthermore, if we think of debug as solving a murder through the use of backward reasoning, then trace is the video surveillance that pinpoints the culprit. Trace is invaluable as it provides real-time visibility into errors, dramatically cutting down design cycles and iterations.


I recently conducted a webinar on how to build an effective and customized debug and trace solution for a multi-core SoC. Register here for free to access the webinar recording.


There is a corresponding White Paper that goes in to a lot more detail on the ARM Debug and Trace IP page.


The White Paper provides the following:

  • High-level steps on building a debug and trace solution
  • Recommended design and verification flow
  • Advantages of using SoC-400 at each stage of your development process
  • Pointers to further information and useful references


Dwight Eisenhower may not have lived until the age of semiconductors, but his quote of “No battle was ever won according to plan, but no battle was ever won without one” rings true in the context of debug subsystem design. Understanding debug and trace hardware features and capabilities is key to building a solution to meet YOUR specific requirements. The paper discussed some of the key design decisions faced by architects.


Stay tuned for more upcoming exciting news about ARM CoreSight IP or sign up for ARM TechCon 2014 to see it for yourself! TechCon will be the first time that members of the public will be able to demo the new design environment for building debug and trace subsystems. This makes it even easier to configure and integrate ARM CoreSight IP within a large system, and will help users cut down on that $312 billion global debug bill. If you have any questions or comments about ARM CoreSight IP or this blog, please write them below and I will get back to you as soon as possible.

I have followed some tutorials on the internet and found one in particular quite interesting and didactic for those just starting to program ARM Bare metal. The Blog é Freedom Embedded | Balau's technical blog on open hardware, free software and security.

 

Below is a summary of needed to succeed in building a Hello World commands, let noted here that in the near future I may supplement this information and synthesize into a more detailed tutorial.

 

Based on the link: Hello world for bare metal ARM using QEMU | Freedom Embedded

 

Compile the code with the following commands:

$ arm-none-eabi-as -mcpu=ARM926EJ-s -g startup.s -o startup.o
$ arm-none-eabi-gcc-c  -mcpu=ARM926EJ-S test.c -g -o test.o
$ arm-none-eabi-ld-T test.ld test.o startup.o -o test.elf
$ arm-none-eabi-objcopy -O binary test.elf test.bin


 

And execute with the following command:

qemu-system-arm -M versatilepb -m 128M -s -nographic S -kernel test.bin


 

Debug with GDB, with the following comand:

arm-none-eabi-gdb


 

Where you get the prompt from GDB, type:

target remote localhost: 1234
file test.elf


 

when finished working with qemu if you have problems with the terminal, use the command:

 

stty sane

 

to fix it.

It has been a full seven months since AMD released detailed information about its Opteron A1100 server CPU, and twenty two months since announcement. Today, at the Hot Chips conference in Cupertino, CA, AMD revealed the final pieces about its ARM powered server strategy headlining the A1100.

You can find more information from AnandTech Portal | AMD’s Big Bet on ARM Powered Servers: Opteron A1100 Revealed

seattle%20SOC.jpg

I am interested in the flowing questions,

1、Opteron A1100 uses eight Cortex-A57 consist of four A57 clusters, since one A57 cluster can contain up to four A57 cores, why A1100 doesn't use two A57 clusters and each cluster contains four A57 cores.

2、Cortex-A57 suppports TrustZone, but A1100 still uses Cortex-A5 to realize TrustZone. I guess the purpose of this design is to be compatible with other AMD SoCs which use x86 core as main CPU

and Cortex-A5 as TrustZone CPU.

 

What do you think about Opteron A1100?

Hello and I welcome you to my ARM programming tutorial series. I would like to give a big thank you to Abhishek Agrawal, a Final Year Undergraduate Student at IIT Kharagpur for his help to complete this blog.


Many students wonder where to start reading about ARM microcontrollers - although there a lot of tutorials and books available on the internet, many of them are out of focus for the beginners in ARM Assembly programming. Here we have started a blog and YouTube video tutorial series for those beginners.


Let’s start with basics, ARM is also known as Advanced RISC Machine, RISC machines have become very powerful these days. ARM processors are completely based on the RISC architecture. This approach reduces the costs of hardware and it produces less heat than traditional x86 architectures hence it is power efficient. It has highly optimized instruction sets.


RISC architecture is also known as Load-Store Architecture, it means CPU cannot directly perform memory operation. For memory operation microcontroller have to first load desired memory location content in a registers then after CPU operation it can store the result through general purpose registers.


                                                           


ARM microcontrollers are the most widely used microcontroller in the world. In a study it has been found in 2005, about 98% of all mobile phones sold used at least one ARM processor.


Instructions for ARM Holdings' cores have 32-bit wide fixed-length instructions, but later versions of the architecture also support a variable-length instruction set that provides both 32 and 16-bit wide instructions for improved code density. Currently ARM microcontrollers have 32-bit architectures in most mobile phones and embedded hardware.


More recently, the ARMv8-A architecture announced in October 2011, adds support for a 64-bit address space and 64-bit arithmetic. It is more power efficient and has greater performance ranges.

 

ARM has three series of microcontrollers namely ARM Cortex-A, ARM CortexR and ARM Cortex-M series. Where Cortex-A microcontrollers intended to Application specific systems such as in smartphones and Cortex-R means real-time specific microcontroller, used in such as space, missile applications. The last one which is mostly used in general purposes applications such as motor control, LED or LCD interfaces etc. is ARM Cortex-M series microcontrollers.

                       ARM Architecture.png

These ARM cortex M series microcontrollers have five different sub series of microcontrollers and they are:

  1. Cortex-M0
  2. Cortex-M0+
  3. Cortex-M1
  4. Cortex-M3
  5. Cortex-M4


The interesting thing is that all microcontrollers are consistently based on 32-bit processor architecture however few of them are using 16-bit thumb instruction set and rest of them are using both thumb and ARM instruction set.


ARM Cortex-M0 is mostly preferred where our requirement is low-power and lowest cost. It has almost all general feature of microcontroller. It has Nested Vectored Interrupt controller which is also known as NVIC. The NVIC is tightly coupled to the processor core. This facilitates low latency exception processing. The main features include:

  1. A configurable number of external interrupts, from 1 to 240 but actual no of interrupt on hardware depend on chip manufacturer
  2. It has configurable number of bits of priority, from three to eight bits
  3. It also support level and pulse interrupt
  4. It also support dynamic reprioritization of interrupts
  5. It can also do priority grouping


Another Important feature is wake up interrupt controller (WIC) interface

Wakeup Interrupt Controller (WIC) can detect an interrupt and wake the processor even from deep sleep mode where processor is resting in minimum power consumption mode. Wireless sensor networking uses this feature for lowest possible power consumption

Another Important feature is Data WatchPoint and BreakPoint. It is a feature of debug unit which is present on chip. In debug mode we can monitor the state of the processor in each and every clock cycle.

                                                          

ARM Cortex-M0+ is superset of Cortex-M0 processor in term of Instruction set. i.e. ARM Cortex-M0 instruction set is 100% compatible with Cortex-M0+ processors. ARM Cortex-M0+ Low Latency I/O Interface provides “Harvard- like” access to peripherals. Improves overall cycle efficiency for I/O access.


               m4F instruction.jpg


ARM Cortex-M3 added more feature in Cortex-M0+ sub series processors, the main feature of this processor is 1-cycle 32-bit hardware multiply, 2-12 cycle 32-bit hardware divide, saturated math support.


Cortex-M4 is a Cortex-M3 plus DSP Instructions, and also optional floating-point unit (FPU) on chip. And if a core contains an FPU then it is known as a Cortex-M4F core microcontroller, otherwise it is only a Cortex-M4 microcontroller.


If you are interested in the ARM Accredited MCU Engineer (AAME) qualification, I'm sure you'll be delighted to know that it now has its own Study Guide to help prepare for the test. This goes alongside the existing AAE Study Guide. You can find it, along with all other public ARM documentation on our documentation portal at infocenter.arm.com

 

Here is a direct link to the document: ARM Information Center

 

Happy studying!

 

Chris

Chinese Version 中文版:引发下一次移动计算革命-ARMv8 SoC处理器

I recently had the opportunity to reflect on the mobile computing revolution of the last five years. I use the term 'mobile computing' deliberately - the compute tasks we handle on mobile phones today directly rival those that were only possible on laptops and desktops several years ago. With uninterrupted direct supply from the wall, laptop and desktop PCs needed fan assisted cooling, and their architecture is designed around that capacity. Today, mobile devices run similarly demanding workloads for a full day (or more) on a single charge and serve as communications hub, entertainment center, game console, and mobile workstation. The architecture of ARM®  based mobiles devices is and has always been designed around the mobile footprint. Continuing to improve the user experience in that footprint requires commitment to deliver the most out of each milliwatt and every millimeter of Silicon.


The success of smartphones and tablets and the software app economy (worth $27 bn and growing) is largely based on SoCs (System-on-Chips) from ARM Partners. Mobile SoCs balance ever-increasing performance with form factor, battery life and price point across an incredibly diverse range of consumers.
Most of them to date have been based on the ARMv7-A architecture, accounting for 95% share of the growing smartphone market. The growing app ecosystem ( with over 40bn downloads ) has been largely designed and coded specifically for the ARM architecture resulting in a vast application base. We are now at the transition point to ARMv8-A, the next generation in efficient computing.

 

2014 will see the arrival of numerous devices featuring the latest ARMv8-A architecture, opening the door for developers, while retaining 100% compatibility with the vast app ecosystem based on 32-bit ARMv7.  It is great to finally be at a point where the first ARMv8 mobile SoCs are coming to the market, and it is particularly positive that some of the upcoming SoCs employ ARM big.LITTLE®  technology,  which combines the high-performance CPUs and high-efficiency CPUs in one processing sub-systems, capable of both 32-bit and 64-bit operation while dynamically moving  workloads to the right size processor and saving upwards of 50% of the energy.


Qualcomm® recently announced their Snapdragon® 810 processor which uses four Cortex®-A57 cores and four Cortex-A53 cores in a big.LITTLE configuration, and the Snapdragon 808 processor which uses two Cortex®-A57 cores and four Cortex-A53 cores, again in a big.LITTLE configuration. These processors are expected to be available in commercial devices by the first half of 2015 and will feature 64-bit ARMv8 support for Android. We have been working together with teams from Qualcomm Technologies and other ecosystem partners for several to ensure that OEMs and OS providers are able to take full advantage of the ARMv8-A Architecture, ensuring that they can rely on the same design philosophy that has made ARMv7-A based Snapdragon processors so successful in the multiple segments of the mobile market.

 

My colleague  James Bruce and I recently collaborated with our counterparts at Qualcomm in writing a paper that delves further into ARMv8-A and explains the journey of bringing an ARMv8 SoC to market - I recommend it for anyone seeking to better understand the SoC design process and mobile processor market space.

 

The white paper (which you will find below) dispels a few myths about ARMv8-A (it's more than just 64 bit, it doesn't double code size, etc.) and outlines the approach one ARM partner takes in combining ARM IP with in-house IP to build a product line ranging from premium smartphone and tablets down to low-cost smartphone tiers for emerging markets.

 

The first half of the paper offers some useful insights into the mobile market, how ARM competes in the market, how Android is delivered on ARM platforms, and the benefits of the latest ARM Cortex-A processors and ARMv8 instruction set architecture.  The second half of the paper dives a bit deeper into Qualcomm's approach to delivering a complete SoC, combining in-house designed components with ARM IP, then optimizing the whole platform. It discusses Qualcomm's use of Cortex-A57 and Cortex-A53 along with big.LITTLE technology in the announced Snapdragon 808 and 810 SoCs, as well as their use of custom-designed CPUs, GPUs, and other components in the Snapdragon product line.

 

The ready availability of ARM IP and the flexibility of the ARM business model provide the freedom to mix and match and the opportunity to rapidly innovate which have been a big factor in enabling ARM partners like Qualcomm to be so successful in the smartphone and mobile computing revolution.

An interesting webinar on "Building a Customized Debug and Trace Solution for a Multi-Core SoC" is being organised on July 24, 2014, 2:00 PM EDT, where you can learn how to:

 

  • Design and verify configurable debug and trace solution for your complex SoC.
  • Use ARM DS-5 tool to maximize the utility of advanced debug and trace hardware.
  • Use CoreSight technology to help identify areas/functionality of your design that can be optimized both in hardware and software throughout the development process.

 

                                                           You can register for the event at:

                                   Building a Customized Debug and Trace Solution for a Multi-Core SoC.

 

Hope to see you there!

Hi,

Just wanted to highlight this thought provoking whitepaper from Trustonic on future of payment and how several pieces of the puzzle (such as FIDO, beacons, HCE and TrustZone based TEE) fit nicely together...

 

https://www.trustonic.com/support/whitepapers/


What do you think? 

Tesla_ARM64_678x452.jpg

Kicking off this week for the world of supercomputing is the 2014 International Supercomputing Conference in Leipzig, Germany. One of the major supercomputing conferences, ISC is Europe’s largest supercomputing conference and as one would expect, an important show for companies vested in high performance computing (HPC) and other aspects of supercomputing. We’ll see a few announcements out of ISC this week, and starting things off will be NVIDIA.

NVIDIA will be taking to the ISC show floor to announce that their Tesla products will be adding ARM64 host compatibility, enabling them to be used in ARM64 systems. NVIDIA has been a supporter of the ARM ecosystem for some time through the use ARM cores in their Tegra SoCs and by enabling CUDA on ARM processors. Adding 64bit ARMv8 (ARM64) support then is a logical extension of this by bringing their hardware and toolkit forward to the new generation of 64bit ARM processors.

However while NVIDIA’s previous ARM works have been focused on consumer uses, today’s Tesla ARM64 announcement is focused on the professional computing side and hence the use of ISC as a backdrop for this announcement. With today’s announcement NVIDIA is expanding their Tesla and HPC efforts into the ARM ecosystem, intending to bootstrap and support the growing use of ARM CPUs as the core processors in HPC setups. ARM CPUs have already made some headway into the micro server space for tasks that require many low performance threads, however it’s not until ARMv8 that ARM processors have gained the ability to address enough memory and have gained enough in performance to be useful in HPC applications. With the increased capabilities of ARM64 processors, HPC system builders can now design systems around ARM, with NVIDIA taking up their now well-defined position as a GPU supplier to provide their highly parallel processors to complete these systems.

Tesla_Hedge_575px.jpg

All things considered NVIDIA is not necessarily introducing new functionality or new performance, but the addition of ARM64 support means that NVIDIA is hedging their bets in the server space. The company already supports Tesla products connected to x86 servers in traditional HPC setups, will offer deeper Tesla support on POWER platforms through their forthcoming NVLink interconnect, and now the company is covering the other end of the spectrum by offering Tesla support for ARM64 platforms. So far the ARM architecture has yet to prove itself in the HPC market beyond some very specific micro server roles, but with NVIDIA’s continued success in the HPC market and the potential for ARM to disrupt the traditional x86 market, it’s not surprising to see NVIDIA hedging their bets just in case that disruption occurs. No matter what happens – x86 holds, POWER takes off, or ARM disrupts – NVIDIA intends have the market covered.

To that end, along with today’s announcement of ARM64 compatibility NVIDIA is also announcing the first Tesla ARM64 development platforms. In July, Cirrascale will be shipping their RM1905D 1U development platform, which contains a pair of Applied Micro X-Gene CPUs along with a pair of Tesla K20 accelerator cards. Meanwhile E4 will be shipping their EK003 system, a 3U system with two X-Gene CPUs and two Tesla K20s.

Tesla_Dev_Platforms_575px.jpg

The Tesla cards of course need no introduction, and meanwhile the X-Gene is an in-house design from Applied Micro that has 8 ARMv8 cores clocked at 2.4GHz. We have previously looked at the X-Gene design a couple of years back, and while they didn’t end up being the first shipping ARMv8 design (Apple’s Cyclone beat them), they are the first ARMv8 design shipping with the appropriate PCIe support to be paired up with Tesla cards. At the time Applied Micro was shooting for a fairly aggressive performance level, but as of right now we don’t know how the X-Gene compares to other ARMv8 designs such as Cyclone, Cortex-A57, and NVIDIA’s own Denver.

Finally, being released in conjunction with these platforms will be the CUDA 6.5 toolkit, which will be introducing ARM64 support on the CUDA side. NVIDIA has not announced a release date for CUDA 6.5, and at this point it’s safe to assume it’s a development release alongside these ARM64 development platforms.

The TrustZone based Trusted Execution Environment is a success story for ARM partners and enables system wide security that can be used to protect the platform and services from software attack.  Historically Trusted OS code has not been available as open source so the OP-TEE OSS project is an interesting development.   Linaro's Security Working Group has been involved and provided some introductory notes here:

https://wiki.linaro.org/WorkingGroups/Security/OP-TEE

They include background info and FAQ and links to the source code on GitHub.  It is currently designed for ARMv7-A with a plan from Linaro to port it to ARMv8-A and alignment with ARM Trusted Firmware.

Some key points:

  • It’s a GlobalPlatform based Trusted OS
  • Currently ARMv7-A (plans for ARMv8-A)
  • Limited hardware support (will need porting)
  • Limited documentation (will be added)
  • Not a key provisioning system for post provisioned Trusted Apps

 

Let me know what you think of it.

There has been lots of interest in ARM Trusted Firmware.  We are aware that YouTube is unavailable to some parts of the world so you might like these links that should work anywhere:

Go to the session link here: https://lca-14.zerista.com/event/member/102447 you can see links to the presentation slides, video on youtube and video accessible in China:

http://people.linaro.org/linaro-connect/lca14/videos/03-03-Monday/LCA14-102-%20Adopting%20ARM%20Trusted%20Firmware.mp4

 

For the ARM Trusted Firmware: http://lcu-13.zerista.com/event/member/85121 and

http://people.linaro.org/linaro-connect/lcu13/videos/10-28-Monday/LCU13%20An%20Introduction%20to%20ARM%20Trusted%20Firmware.mp4

 

Please enjoy the videos and then download the latest release from the Github...

 

ARM-software/arm-trusted-firmware · GitHub

Filter Blog

By date:
By tag: