Big gives the impression of being large as in size, height, width, or amount: a big house; a big quantity. We've heard of big brother, big bang, or the notorious wrapper B.I.G. aka Biggie Smalls (after a character in the 1975 film Let's Do It Again).
Me, I am none of those things but I am a big brother and will create a big bang being the highest performing Cortex processor from ARM with my new name ARM® CortexTM-A57 instead of my internal name of Atlas. Big bang for your buck is appropriate based on my performance and efficiency being three times the performance of 2012 superphones running the same applications. I guess I am big on efficiency as I consume similar power as todays 40nm ARM based processors based on my target process of 20nm. Equally I provide up to 5 times the power efficiency of tomorrow's tablets and notebooks.
Back to my family tree, I evolved from Cortex-A15 and have similar pipeline and microarchitecture with the same characteristics to easily fit into a spectrum of mobile applications. Everything from smartphones to multiprocessor based tablets is my target in dual and quad core configurations.
My sibling, the Cortex-A53 is classed as little (aka LITTLE), but packs a big punch. Working as a team in a big.LITTLE configuration for mobile products, we're already demonstrating big.LITTLE savings of 50% to 70% energy and delivering significantly more performance at lower energy than today's best, the Cortex-A9. This is no big story as my family members Cortex-A15 and Cortex-A7 have proven the point in hardware. In big.LITTLE configurations I like to take it easy when I'm not needed and can be found napping for most of the time in mobile applications, but when my team mate needs a hand I can be called upon immediately to complete the big performance threads in record time and when the job is done I go back to sleep. My little brother and I work perfectly together transferring tasks without any intervention in user space.
Big is relative, and compared to the other families used in laptops and efficient servers my family looks like we're on a diet since we are so much smaller, but capable of delivering comparable performance especially when we are grouped together in a quad core configuration and paired in a big.LITTLE environment. I guess this is in my ARM family genes. Yes you can also push my frequency for more peak performance, but I still maintain my efficiency unlike the other families out there.
So not only can I be used in heterogeneous environments, but also in homogeneous environments where my clones and I can be part of a cluster; where up to four Cortex-A57s are grouped in a cache coherent manner. Using my cousin, the CoreLinkTM CCN-504 Cache Coherent Network, there can be up to 16 of me in any given system. This and my new enhanced floating point and RAS capabilities have opened up big horizons in terms of new applications; servers, storage and networking solutions are being worked on using my family. I have many new configurable features including SECDED ECC on my software writable RAMs in L1 and L2 as well as improved error reporting/logging support. So we provide scalability and flexibility for new markets, but our roots are mobile efficiency which we include in everything we do.
Solutions based on combining family members, Cortex-A57 and Cortex-A53 in clusters give the best of both worlds, but my family has some low power pedigree when you consider my clock tree for example; I'm extensively clock gated like all ARM processors since the ARM9, but I make use of a hierarchical clock gating approach that takes power reduction even further. I have power domains around each of my core CPUs and around other key structures to allow programmers to shut down unused blocks. And I have a retention mode that allows me to go to sleep while keeping my caches fresh with data, while consuming much lower power requiring no software intervention to enter this mode. Extensive clock gating and hooks for leakage mitigation are included in my extended family member the CoreLink CCN-504 which has granular DVFS and CPU shutdown support as well as partial or full level-3 cache shutdown and various retention modes. Programmable power modes are available in the dynamic memory controller DMC-520.
My brother and I (Cortex-A53 and Cortex-A57) are both capable 64-bit processors which can optimally run 32-bit code. In fact there is a performance boost running 32-bit code from my previous generations. We did not gain weight by adopting 64-bit data types. While I'm marginally (and I mean only marginally) bigger on the same process, in the real world when I show up in 20nm I have a much smaller area and lower power than Cortex-A15 which will be shipping shortly. The ARMv8 architecture uses a fixed 32-bit instruction size, and has eliminated banked registers to support 31 general purpose registers while retaining the same number of total registers as per ARMv7. Oh, I also shut down unused portions of these registers when not in use. An example would be when I'm in Aarch32 execution state.
I will be in the 2014 and 2015 markets across the whole application gamut from mobile to enterprise. You will hear of my brother and me providing longer battery life, improved thermal attributes, consuming less power than my predecessors while delivering comparable performance to the other families out there in tablets, mobile devices, notebooks and enterprise solutions.
So big is relative but how relative is your big?
Followers of ARM Processors would also be interested in this article - thanks to Ian for giving some insight into the world (and emotions) of the Cortex-A57 !