Elba - How do we know it works?

September 11, 2013

3 minute read time.

In part 1 of this blog, I outlined the thought process behind the Elba program. Here I'll look at the implementation decisions for the project.

In ARM there are various stages of maturity of a new processor development, reaching silicon implementation in various fabrication processes is one of those and it made sense to us that Elba must also be a full silicon implementation. In fact, just in case this does work, and what we think may happen does, we'll implement the Cortex-A9 processor in a way such that ARM could commercialize and promote these "G" implementations as a new product. But what type of silicon? In ARM we often build silicon devices, but these typically are no more functional as a device than something that can execute a little code from on-chip memories. Great, so our goal to build a multi-GHz Cortex-A9 will be able to run Dhrystone - we need more than that . . . How much more? As it ended up, quite a bit more. ARM also develops the Mali 3D graphic processors, so the device should include the latest 3D graphics capability, tick. ARM also develops products that span the memory backplane, so it should also include our latest CoreLink memory controllers, DMA and interconnect technology, tick. Elba was starting to look like a rather interesting SoC. So now we can run Dhrystone and, yep, need a LCD output or two, HD of course, tick.

In the final part of the initial SoC design, since ARM does not develop IP for highspeed interfaces, we needed to partner. It was clear to the team that a chip that contains processors capable of a multi-GHz multicore and 1080p 3D would need rather a lot of IO bandwidth to feed it. So on went 8 lanes of GEN2 PCI Express. Is that it? It would be if that was going to be a product, but this was a development chip from which various groups in ARM wanted to glean information about how data moves across such a device when operating at such high throughputs. So another small team started to place monitors and meters all over the design to enable the extraction of the type of information into which no simulator or emulator can give any meaningful insight. Last, but by no means least, the SoC design was plastered with more power management technology than was probably good for it, but at least it would allow us to investigate various new aspects of power management.

Osprey Cortex-A9 Hardmacro

The second aspect of the program was the specification and implementation of the Cortex-A9 hard macro. Initial thoughts were that we would build only a single macro, but to maximize the opportunity to learn, we quickly concluded there would need to be two macros. One would target maximum speed, and the other would be optimized for low power, especially with respect to leakage.

By this time, there had been a number of Cortex-A9 implementations started, and it was quickly becoming clear that a dual-core implementation was being favoured, one that included the NEON media processing engine, and 32K Level 1 instruction and data caches. All designs included the Level 2 cache controller, but the size and speed of the L2 rams varied. So the macro specification became dual-core with NEON, 32K I$ and D$, including the L2 cache controller, but not its RAMS. We also included various DFT and DFM capabilities and encapsulated all clock, reset and synchronizers within the macro to make it much simpler to implement in a SoC.

So now we had the chosen implementation we went on to begin looking at the processor power management, a subject I'll cover in part three.

UPDATE:
Part 1: Wouldn't it be interesting if we... - Giving Birth to "Elba"
Part 3: Elba Processor Power Management
Part 4: Elba - Bringing it all together

0 comments
0 members are here

Architectures and Processors blog

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025
Caches and Self-Modifying Code: Working with Threads

Jacob Bramley

How to synchronize JIT-compiled instructions across threads.
- January 21, 2025
Caches and Self-Modifying Code: Implementing `__clear_cache`

Jacob Bramley

How to implement `__clear_cache` using assembly.
- January 20, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Elba - How do we know it works?

Osprey Cortex-A9 Hardmacro

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Caches and Self-Modifying Code: Working with Threads

Caches and Self-Modifying Code: Implementing `__clear_cache`