Hi experts,
I want to knows why there are 4 core cores per cluster in ARM big.Littte architecture?
Is it possiable if we make more cores per cluster? if not, what is the limitation?
Hi,
The ARMv8 architecture does not mandate 4 cores per cluster.
Technically the architecture does not even define the concept of cores and clusters, instead defining the concept of "PE affinity" (where "PE" is short for processing element, i.e. anything with a program counter - this could be a single-threaded core, or each thread in a hardware-multi-threaded core, etc).
The affinity of a PE is split into 3-4 fields that can be used in whichever way you like. The only limitation that the architecture places on the use of these fields is that:
"The assigned value of the MPIDR.{Aff2, Aff1, Aff0} or MPIDR_EL1.{Aff3, Aff2, Aff1, Aff0} setof fields of each PE must be unique within the system as a whole."
- ARMv8-A Architecture Reference Manual (ARM DDI 0487A.j) section D7.2.67 MPIDR_EL1
A typical use case is to treat Aff1 as the cluster ID and Aff0 as the core ID.
You may have noticed that ARM's own implementations such as the ARM Cortex-A53 MPCore can be configured to have between 1 and 4 cores; this is simply a design decision of that particular processor implementation, not a limitation mandated by the architecture. Your own processor implementation can allow as many cores as you design it to have.
Hope that helps.
It's also worth noting that you can have more than two clusters of cores; nothing in the AXI specification limits you to 2, and similarly there are no restrictions on what those clusters are. If you have a design where you want an 8-core Cortex-A53 you could implement multiple clusters of Cortex-A53s to achieve that, for example.
Cheers, Pete
Currently we see only implementations of up to 4 CPUs (or PEs) per cluster.
So is there a limit in the cache IP for this or is it just a "practical" problem to have e.g. 8 cache-coherent PEs in a cluster?
I suspect it's just a question of "4 is enough" for most use cases. Why complicate the design with more cores than most uses cases actually require?
Peter,
but there are for example the LS1088A, which comes with 8(!) A72 cores. But in two clusters of 4. But we'll never know why, NXP/Freescale did it.
I never saw more then 4 cores in a cluster, so there must be some limiting factor.
hi Ash & Peter:
so, are there no any technical limitations for more than 4 cores in a cluster ? for example, cache coherence or gic or anything ?
the only reason is "4 is enough" for performance ?
in the future, maybe we could design the soc with 8 cores in a cluster , more clusters ?
now , we can see helio X20 soc, which provides three clusters and 10 cores(4+4+2).
As Ash has already stated in his first answer, for the ARM Cortex-A family there is a limit of 4. There isn't any specific technical reason - it's just a design choice based on what we see our partners needing for their designs.
It's like asking "Why do cars have 4 wheels?" - a company could build a car with three wheels (Robin Reliant), or with six wheels (Tyrrell P34), but in most cases the answer is four. Why? It's enough to do what the car needs to do, so why add more.
The ARM architecture doesn't have a limit, so this could change in future, or a partner with an architecture license could build their own CPU today with as many cores in a cluster as they wanted.
HTH, Pete
So an IP-customer cannot just "plug" n Cortex-A53 together, but buys a single, dual or quad CA53 IP?
See, we just like to understand, why certain companies (as the before mentioned NXP) build chips with 8 cores but in 2 clusters.
BTW: Cars have mostly four wheels because of physical/technical reason. So there is surely a technical reason for the max. 4 cores/cluster choice. But it is ok, if ARM does not want to share this with everyone ;-)
42bis,
The technical reason is allowing more configurations at RTL synthesis time means more combinations to validate, which is a lot of work. There was a design decision to limit it to 4, a long time ago, for those products. You could imagine that the logic that connects the cores together only has 4 ports in the design. If you configure 2 cores, some of that logic is optimized away. But there are only 4 combinations to work out - 1, 2, 3 and 4 cores in a cluster. Imagine if we supported 32 cores in a cluster - that'd be 8x the work to validate it. To cover that we might decide that you can only have 1, 2, 3, 4, 8, 16 and 32 (which is not even twice the work). Aside from that you have to cover power consumption and area concerns with larger systems. That's the limitation. We only wish it was as simple as "copy & paste"
That's an extremely simple view of things, but you get the idea, right?
Ta,
Matt
Thanks Matt for the insights. I get the idea.