Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Elba - How do we know it works?
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • Cortex-A9
  • NEON
  • Cortex-A
  • soc_design
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Elba - How do we know it works?

John Goodacre
John Goodacre
September 11, 2013
3 minute read time.

In part 1 of this blog, I outlined the thought process behind the Elba program. Here I'll look at the implementation decisions for the project.

In ARM there are various stages of maturity of a new processor development, reaching silicon implementation in various fabrication processes is one of those and it made sense to us that Elba must also be a full silicon implementation. In fact, just in case this does work, and what we think may happen does, we'll implement the Cortex-A9 processor in a way such that ARM could commercialize and promote these "G" implementations as a new product. But what type of silicon? In ARM we often build silicon devices, but these typically are no more functional as a device than something that can execute a little code from on-chip memories. Great, so our goal to build a multi-GHz Cortex-A9 will be able to run Dhrystone - we need more than that . . . How much more?  As it ended up, quite a bit more. ARM also develops the Mali 3D graphic processors, so the device should include the latest 3D graphics capability, tick.  ARM also develops products that span the memory backplane, so it should also include our latest CoreLink memory controllers, DMA and interconnect technology, tick. Elba was starting to look like a rather interesting SoC. So now we can run Dhrystone and, yep, need a LCD output or two, HD of course, tick.

In the final part of the initial SoC design, since ARM does not develop IP for highspeed interfaces, we needed to partner.  It was clear to the team that a chip that contains processors capable of a multi-GHz multicore and 1080p 3D would need rather a lot of IO bandwidth to feed it.  So on went 8 lanes of GEN2 PCI Express. Is that it?  It would be if that was going to be a product, but this was a development chip from which various groups in ARM wanted to glean information about how data moves across such a device when operating at such high throughputs. So another small team started to place monitors and meters all over the design to enable the extraction of the type of information into which no simulator or emulator can give any meaningful insight. Last, but by no means least, the SoC design was plastered with more power management technology than was probably good for it, but at least it would allow us to investigate various new aspects of power management.


Osprey Cortex-A9 Hardmacro

The second aspect of the program was the specification and implementation of the Cortex-A9 hard macro. Initial thoughts were that we would build only a single macro, but to maximize the opportunity to learn, we quickly concluded there would need to be two macros. One would target maximum speed, and the other would be optimized for low power, especially with respect to leakage.

By this time, there had been a number of Cortex-A9 implementations started, and it was quickly becoming clear that a dual-core implementation was being favoured, one that included the NEON media processing engine, and 32K Level 1 instruction and data caches. All designs included the Level 2 cache controller, but the size and speed of the L2 rams varied.  So the macro specification became dual-core with NEON, 32K I$ and D$, including the L2 cache controller, but not its RAMS.  We also included various DFT and DFM capabilities and encapsulated all clock, reset and synchronizers within the macro to make it much simpler to implement in a SoC.

So now we had the chosen implementation we went on to begin looking at the processor power management, a subject I'll cover in part three.

UPDATE:
Part 1: Wouldn't it be interesting if we... - Giving Birth to "Elba"
Part 3: Elba Processor Power Management
Part 4: Elba - Bringing it all together

Anonymous
Architectures and Processors blog
  • When a barrier does not block: The pitfalls of partial order

    Wathsala Vithanage
    Wathsala Vithanage
    Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
    • September 15, 2025
  • Introducing GICv5: Scalable and secure interrupt management for Arm

    Christoffer Dall
    Christoffer Dall
    Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
    • April 28, 2025
  • Getting started with AARCHMRS Features.json using Python

    Joh
    Joh
    A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
    • April 8, 2025