This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

AArch64/GICv3:ICC_SGI1R_EL1: AFF1

42Bastian Schick over 4 years ago

I wonder, is AFF1 in ICC_SGI1R_EL1 also a bit-mask or does it address directly the cluster?

So does AFF1 == 3 address cluster 3 or cluster 0 and cluster 1.

Top replies

a.surati over 4 years ago +1 verified

I think that, from the perspective of the system software, it doesn't matter. It needs to paste into SGI register's aff{1,2,3} the values copied from target's mpidr_el1. It is only the mpidr_el1.aff0...
a.surati over 4 years ago in reply to 42Bastian Schick +1

42Bastian Schick said: How would you "re-organize" the core/cluster setup. I'd say this is fixed in the HW. True. I did indeed mean a hardware redesign/reorganization. If the hardware design determined...

Parents

0 a.surati over 4 years ago in reply to 42Bastian Schick

42Bastian Schick said:
How would you "re-organize" the core/cluster setup. I'd say this is fixed in the HW.

True. I did indeed mean a hardware redesign/reorganization. If the hardware design determined a particular cluster organization, that design must have taken into account the typical load the system is expected to handle. Running on it a generic load, or a load which constitutes a worst-case scenario (but not an average-case scenario) does lead to a worse performance.

42Bastian Schick said:
Actually, I wonder, why NXP make the LX2160A with 8 clusters à 2 core instead of 4 by 4. But maybe it is a yield thing as it might be easier do disable a single core in a cluster to get the LX2080A ;-)

:-)

Nevertheless, looking at the dts included in Linux for the two devices, it can be seen that each cluster (in any of the two LX devices/chips) is made up of 2 cores. Certain amount of dedicated L2 cache is assigned to each cluster. I think this cluster-setup has to do with supporting parallel-processing without too much sharing/contention (of/on L2 cache, for instance). That is, each cluster is effectively a 'single' 2-threaded core. If an Arm chip, comparable to a72 and with 2 threads per core, were available, I guess NXP would choose that chip and would build a 2160' with 8 such cores.

For LX2160A, wouldn't a 4x4 cluster, with 2MB L2 per cluster, introduce higher contention/traffic on the L2 controller than their current design?

42Bastian Schick said:
Sure, but if you want to wake up for example in the LX2160a core 2,6,7 you need to make 2 IPIs, as core 2 is in cluster 1, core 6 and 7 in cluster 3.

True. However, the LX devices are meant to run network-processing load. Its OS/driver needs to honour the hardware design and schedule the workload such that the cross-cluster IPIs are kept at a minimum required.

I do not know how the load is distributed on such devices. But, assuming that each cluster is given a set of connections to process, and these sets are kept disjoint, the processing of each set won't need IPIs outside its home-cluster, taking also into account factors such as per-cpu data areas, etc. maintained by the OS/driver.

The load distribution is one factor that determines the ratio of (total) cross-cluster IPIs to (total) home-cluster IPIs. If the OS reserves a single cluster for its own use and dedicates others to processing the connections, would that ratio get close to 1?

I guess that for Arm to change sgi.aff1 into a bitmask, a justification, which is at least as strong as the one required when aff0 was made a bitmask, becomes necessary.
Cancel
Up +1 Down

Cancel

Reply

0 a.surati over 4 years ago in reply to 42Bastian Schick

42Bastian Schick said:
How would you "re-organize" the core/cluster setup. I'd say this is fixed in the HW.

True. I did indeed mean a hardware redesign/reorganization. If the hardware design determined a particular cluster organization, that design must have taken into account the typical load the system is expected to handle. Running on it a generic load, or a load which constitutes a worst-case scenario (but not an average-case scenario) does lead to a worse performance.

42Bastian Schick said:
Actually, I wonder, why NXP make the LX2160A with 8 clusters à 2 core instead of 4 by 4. But maybe it is a yield thing as it might be easier do disable a single core in a cluster to get the LX2080A ;-)

:-)

Nevertheless, looking at the dts included in Linux for the two devices, it can be seen that each cluster (in any of the two LX devices/chips) is made up of 2 cores. Certain amount of dedicated L2 cache is assigned to each cluster. I think this cluster-setup has to do with supporting parallel-processing without too much sharing/contention (of/on L2 cache, for instance). That is, each cluster is effectively a 'single' 2-threaded core. If an Arm chip, comparable to a72 and with 2 threads per core, were available, I guess NXP would choose that chip and would build a 2160' with 8 such cores.

For LX2160A, wouldn't a 4x4 cluster, with 2MB L2 per cluster, introduce higher contention/traffic on the L2 controller than their current design?

42Bastian Schick said:
Sure, but if you want to wake up for example in the LX2160a core 2,6,7 you need to make 2 IPIs, as core 2 is in cluster 1, core 6 and 7 in cluster 3.

True. However, the LX devices are meant to run network-processing load. Its OS/driver needs to honour the hardware design and schedule the workload such that the cross-cluster IPIs are kept at a minimum required.

I do not know how the load is distributed on such devices. But, assuming that each cluster is given a set of connections to process, and these sets are kept disjoint, the processing of each set won't need IPIs outside its home-cluster, taking also into account factors such as per-cpu data areas, etc. maintained by the OS/driver.

The load distribution is one factor that determines the ratio of (total) cross-cluster IPIs to (total) home-cluster IPIs. If the OS reserves a single cluster for its own use and dedicates others to processing the connections, would that ratio get close to 1?

I guess that for Arm to change sgi.aff1 into a bitmask, a justification, which is at least as strong as the one required when aff0 was made a bitmask, becomes necessary.
Cancel
Up +1 Down

Cancel

Children

No data