I wonder, is AFF1 in ICC_SGI1R_EL1 also a bit-mask or does it address directly the cluster?
So does AFF1 == 3 address cluster 3 or cluster 0 and cluster 1.
a.surati said:If such IPIs are a regular part of the interrupt traffic, redesigning the cluster-setup may be of benefit so that the related cores can be grouped together into a single cluster.
How would you "re-organize" the core/cluster setup. I'd say this is fixed in the HW.
Actually, I wonder, why NXP make the LX2160A with 8 clusters à 2 core instead of 4 by 4. But maybe it is a yield thing as it might be easier do disable a single core in a cluster to get the LX2080A ;-)
a.surati said:Scheduling IPIs targeting some cores belonging to different clusters is a job that software can do relatively easily.
Sure, but if you want to wake up for example in the LX2160a core 2,6,7 you need to make 2 IPIs, as core 2 is in cluster 1, core 6 and 7 in cluster 3.
42Bastian Schick said:How would you "re-organize" the core/cluster setup. I'd say this is fixed in the HW.
True. I did indeed mean a hardware redesign/reorganization. If the hardware design determined a particular cluster organization, that design must have taken into account the typical load the system is expected to handle. Running on it a generic load, or a load which constitutes a worst-case scenario (but not an average-case scenario) does lead to a worse performance.
42Bastian Schick said:Actually, I wonder, why NXP make the LX2160A with 8 clusters à 2 core instead of 4 by 4. But maybe it is a yield thing as it might be easier do disable a single core in a cluster to get the LX2080A ;-)
:-)
Nevertheless, looking at the dts included in Linux for the two devices, it can be seen that each cluster (in any of the two LX devices/chips) is made up of 2 cores. Certain amount of dedicated L2 cache is assigned to each cluster. I think this cluster-setup has to do with supporting parallel-processing without too much sharing/contention (of/on L2 cache, for instance). That is, each cluster is effectively a 'single' 2-threaded core. If an Arm chip, comparable to a72 and with 2 threads per core, were available, I guess NXP would choose that chip and would build a 2160' with 8 such cores.
For LX2160A, wouldn't a 4x4 cluster, with 2MB L2 per cluster, introduce higher contention/traffic on the L2 controller than their current design?
42Bastian Schick said:Sure, but if you want to wake up for example in the LX2160a core 2,6,7 you need to make 2 IPIs, as core 2 is in cluster 1, core 6 and 7 in cluster 3.
True. However, the LX devices are meant to run network-processing load. Its OS/driver needs to honour the hardware design and schedule the workload such that the cross-cluster IPIs are kept at a minimum required.
I do not know how the load is distributed on such devices. But, assuming that each cluster is given a set of connections to process, and these sets are kept disjoint, the processing of each set won't need IPIs outside its home-cluster, taking also into account factors such as per-cpu data areas, etc. maintained by the OS/driver.
The load distribution is one factor that determines the ratio of (total) cross-cluster IPIs to (total) home-cluster IPIs. If the OS reserves a single cluster for its own use and dedicates others to processing the connections, would that ratio get close to 1?
I guess that for Arm to change sgi.aff1 into a bitmask, a justification, which is at least as strong as the one required when aff0 was made a bitmask, becomes necessary.