We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I wonder, is AFF1 in ICC_SGI1R_EL1 also a bit-mask or does it address directly the cluster?
So does AFF1 == 3 address cluster 3 or cluster 0 and cluster 1.
42Bastian Schick said:How would you "re-organize" the core/cluster setup. I'd say this is fixed in the HW.
True. I did indeed mean a hardware redesign/reorganization. If the hardware design determined a particular cluster organization, that design must have taken into account the typical load the system is expected to handle. Running on it a generic load, or a load which constitutes a worst-case scenario (but not an average-case scenario) does lead to a worse performance.
42Bastian Schick said:Actually, I wonder, why NXP make the LX2160A with 8 clusters à 2 core instead of 4 by 4. But maybe it is a yield thing as it might be easier do disable a single core in a cluster to get the LX2080A ;-)
:-)
Nevertheless, looking at the dts included in Linux for the two devices, it can be seen that each cluster (in any of the two LX devices/chips) is made up of 2 cores. Certain amount of dedicated L2 cache is assigned to each cluster. I think this cluster-setup has to do with supporting parallel-processing without too much sharing/contention (of/on L2 cache, for instance). That is, each cluster is effectively a 'single' 2-threaded core. If an Arm chip, comparable to a72 and with 2 threads per core, were available, I guess NXP would choose that chip and would build a 2160' with 8 such cores.
For LX2160A, wouldn't a 4x4 cluster, with 2MB L2 per cluster, introduce higher contention/traffic on the L2 controller than their current design?
42Bastian Schick said:Sure, but if you want to wake up for example in the LX2160a core 2,6,7 you need to make 2 IPIs, as core 2 is in cluster 1, core 6 and 7 in cluster 3.
True. However, the LX devices are meant to run network-processing load. Its OS/driver needs to honour the hardware design and schedule the workload such that the cross-cluster IPIs are kept at a minimum required.
I do not know how the load is distributed on such devices. But, assuming that each cluster is given a set of connections to process, and these sets are kept disjoint, the processing of each set won't need IPIs outside its home-cluster, taking also into account factors such as per-cpu data areas, etc. maintained by the OS/driver.
The load distribution is one factor that determines the ratio of (total) cross-cluster IPIs to (total) home-cluster IPIs. If the OS reserves a single cluster for its own use and dedicates others to processing the connections, would that ratio get close to 1?
I guess that for Arm to change sgi.aff1 into a bitmask, a justification, which is at least as strong as the one required when aff0 was made a bitmask, becomes necessary.