This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

why there are 4 cores per cluster in ARMV8 architecture

RadarSong over 9 years ago

Hi experts,

I want to knows why there are 4 core cores per cluster in ARM big.Littte architecture?

Is it possiable if we make more cores per cluster? if not, what is the limitation?

0 Ash Wilding over 9 years ago

Hi,
The ARMv8 architecture does not mandate 4 cores per cluster.
Technically the architecture does not even define the concept of cores and clusters, instead defining the concept of "PE affinity" (where "PE" is short for processing element, i.e. anything with a program counter - this could be a single-threaded core, or each thread in a hardware-multi-threaded core, etc).
The affinity of a PE is split into 3-4 fields that can be used in whichever way you like. The only limitation that the architecture places on the use of these fields is that:
"The assigned value of the MPIDR.{Aff2, Aff1, Aff0} or MPIDR_EL1.{Aff3, Aff2, Aff1, Aff0} set
of fields of each PE must be unique within the system as a whole."
- ARMv8-A Architecture Reference Manual (ARM DDI 0487A.j) section D7.2.67 MPIDR_EL1
A typical use case is to treat Aff1 as the cluster ID and Aff0 as the core ID.
You may have noticed that ARM's own implementations such as the ARM Cortex-A53 MPCore can be configured to have between 1 and 4 cores; this is simply a design decision of that particular processor implementation, not a limitation mandated by the architecture. Your own processor implementation can allow as many cores as you design it to have.
Hope that helps.
Cancel
Vote up 0 Vote down

Cancel
0 Peter Harris over 9 years ago in reply to Ash Wilding

It's also worth noting that you can have more than two clusters of cores; nothing in the AXI specification limits you to 2, and similarly there are no restrictions on what those clusters are. If you have a design where you want an 8-core Cortex-A53 you could implement multiple clusters of Cortex-A53s to achieve that, for example.
Cheers,
Pete
Cancel
Vote up 0 Vote down

Cancel
0 42Bastian over 9 years ago in reply to Ash Wilding

Currently we see only implementations of up to 4 CPUs (or PEs) per cluster.
So is there a limit in the cache IP for this or is it just a "practical" problem to have e.g. 8 cache-coherent PEs in a cluster?
Cancel
Vote up 0 Vote down

Cancel
0 Peter Harris over 9 years ago in reply to 42Bastian

I suspect it's just a question of "4 is enough" for most use cases. Why complicate the design with more cores than most uses cases actually require?
Cancel
Vote up 0 Vote down

Cancel
0 42Bastian over 9 years ago in reply to Peter Harris

Peter,
but there are for example the LS1088A, which comes with 8(!) A72 cores. But in two clusters of 4. But we'll never know why, NXP/Freescale did it.
I never saw more then 4 cores in a cluster, so there must be some limiting factor.
Cancel
Vote up 0 Vote down

Cancel
0 RadarSong over 9 years ago in reply to Ash Wilding

hi Ash & Peter:
so, are there no any technical limitations for more than 4 cores in a cluster ? for example, cache coherence or gic or anything ?
the only reason is "4 is enough" for performance ?
in the future, maybe we could design the soc with 8 cores in a cluster , more clusters ?
now , we can see helio X20 soc, which provides three clusters and 10 cores(4+4+2).
Cancel
Vote up 0 Vote down

Cancel
0 Peter Harris over 9 years ago in reply to RadarSong

As Ash has already stated in his first answer, for the ARM Cortex-A family there is a limit of 4. There isn't any specific technical reason - it's just a design choice based on what we see our partners needing for their designs.
It's like asking "Why do cars have 4 wheels?" - a company could build a car with three wheels (Robin Reliant), or with six wheels (Tyrrell P34), but in most cases the answer is four. Why? It's enough to do what the car needs to do, so why add more.
The ARM architecture doesn't have a limit, so this could change in future, or a partner with an architecture license could build their own CPU today with as many cores in a cluster as they wanted.
HTH,
Pete
Cancel
Vote up 0 Vote down

Cancel
0 42Bastian over 9 years ago in reply to Peter Harris

So an IP-customer cannot just "plug" n Cortex-A53 together, but buys a single, dual or quad CA53 IP?
See, we just like to understand, why certain companies (as the before mentioned NXP) build chips with 8 cores but in 2 clusters.
BTW: Cars have mostly four wheels because of physical/technical reason. So there is surely a technical reason for the max. 4 cores/cluster choice. But it is ok, if ARM does not want to share this with everyone ;-)
Cancel
Vote up 0 Vote down

Cancel
0 Matt Sealey over 9 years ago in reply to 42Bastian

42bis,
The technical reason is allowing more configurations at RTL synthesis time means more combinations to validate, which is a lot of work. There was a design decision to limit it to 4, a long time ago, for those products. You could imagine that the logic that connects the cores together only has 4 ports in the design. If you configure 2 cores, some of that logic is optimized away. But there are only 4 combinations to work out - 1, 2, 3 and 4 cores in a cluster. Imagine if we supported 32 cores in a cluster - that'd be 8x the work to validate it. To cover that we might decide that you can only have 1, 2, 3, 4, 8, 16 and 32 (which is not even twice the work). Aside from that you have to cover power consumption and area concerns with larger systems. That's the limitation. We only wish it was as simple as "copy & paste"
That's an extremely simple view of things, but you get the idea, right?
Ta,
Matt
Cancel
Vote up 0 Vote down

Cancel
0 42Bastian over 9 years ago in reply to Matt Sealey

Thanks Matt for the insights. I get the idea.
Cancel
Vote up 0 Vote down

Cancel