This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

AArch64/GICv3:ICC_SGI1R_EL1: AFF1

I wonder, is AFF1 in ICC_SGI1R_EL1 also a bit-mask or does it address directly the cluster?

So does AFF1 == 3 address cluster 3 or cluster 0 and cluster 1.

Parents
  • If such IPIs are a regular part of the interrupt traffic, redesigning the cluster-setup may be of benefit so that the related cores can be grouped together into a single cluster.

    How would you "re-organize" the core/cluster setup. I'd say this is fixed in the HW.

    Actually, I wonder, why NXP make the LX2160A with 8 clusters à 2 core instead of 4 by 4. But maybe it is a yield thing as it might be easier do disable a single core in a cluster to get the LX2080A ;-)

    Scheduling IPIs targeting some cores belonging to different clusters is a job that software can do relatively easily.

    Sure, but if you want to wake up for example in the LX2160a core 2,6,7 you need to make 2 IPIs, as core 2 is in cluster 1, core 6 and 7 in cluster 3.

Reply
  • If such IPIs are a regular part of the interrupt traffic, redesigning the cluster-setup may be of benefit so that the related cores can be grouped together into a single cluster.

    How would you "re-organize" the core/cluster setup. I'd say this is fixed in the HW.

    Actually, I wonder, why NXP make the LX2160A with 8 clusters à 2 core instead of 4 by 4. But maybe it is a yield thing as it might be easier do disable a single core in a cluster to get the LX2080A ;-)

    Scheduling IPIs targeting some cores belonging to different clusters is a job that software can do relatively easily.

    Sure, but if you want to wake up for example in the LX2160a core 2,6,7 you need to make 2 IPIs, as core 2 is in cluster 1, core 6 and 7 in cluster 3.

Children
  • How would you "re-organize" the core/cluster setup. I'd say this is fixed in the HW.

    True. I did indeed mean a hardware redesign/reorganization. If the hardware design determined a particular cluster organization, that design must have taken into account the typical load the system is expected to handle. Running on it a generic load, or a load which constitutes a worst-case scenario (but not an average-case scenario) does lead to a worse performance.

    Actually, I wonder, why NXP make the LX2160A with 8 clusters à 2 core instead of 4 by 4. But maybe it is a yield thing as it might be easier do disable a single core in a cluster to get the LX2080A ;-)

    :-)

    Nevertheless, looking at the dts included in Linux for the two devices, it can be seen that each cluster (in any of the two LX devices/chips) is made up of 2 cores. Certain amount of dedicated L2 cache is assigned to each cluster. I think this cluster-setup has to do with supporting parallel-processing without too much sharing/contention (of/on L2 cache, for instance). That is, each cluster is effectively a 'single' 2-threaded core. If an Arm chip, comparable to a72 and with 2 threads per core, were available, I guess NXP would choose that chip and would build a 2160' with 8 such cores.

    For LX2160A, wouldn't a 4x4 cluster, with 2MB L2 per cluster, introduce higher contention/traffic on the L2 controller than their current design?

    Sure, but if you want to wake up for example in the LX2160a core 2,6,7 you need to make 2 IPIs, as core 2 is in cluster 1, core 6 and 7 in cluster 3.

    True. However, the LX devices are meant to run network-processing load. Its OS/driver needs to honour the hardware design and schedule the workload such that the cross-cluster IPIs are kept at a minimum required.

    I do not know how the load is distributed on such devices. But, assuming that each cluster is given a set of connections to process, and these sets are kept disjoint, the processing of each set won't need IPIs outside its home-cluster, taking also into account factors such as per-cpu data areas, etc. maintained by the OS/driver.

    The load distribution is one factor that determines the ratio of (total) cross-cluster IPIs to (total) home-cluster IPIs. If the OS reserves a single cluster for its own use and dedicates others to processing the connections, would that ratio get close to 1?

    I guess that for Arm to change sgi.aff1 into a bitmask, a justification, which is at least as strong as the one required when aff0 was made a bitmask, becomes necessary.