I have a SoCmade of two DynamicIQ shared unit cluster, Cluster0: 4xA76+4xA55, Cluster2: 4xA55. Each of them have their own L3, and connect to a NoC. The typical use case of this SoC is to run two OS on two clusters. e.g. Android on Cluster1 and linux on Cluster2, they communicate through shared memory and mailbox mechansim to accomendate specific applications such as IVI or Service Robot.
I want to know if it is possible to rule them under the same linux OS, so more generic applications scenario can be expanded without need the complexity of two OS or wasting computing resources on the second 'little' cluster.
I see NUMA is something close but not very sure if it is feasible in this scenario.
Yes. the soc is designed to run two OS
1. The L1 and L2 cache are the same, L3 size is different for two clustert (L3 is part of DSU, not per core). The 'small' cluster A55 does not have Trust-Zone, i'm not aware of other differences.
2. The two cluster (or OS) is designed to communicate over Shared Memory and mailbox (FIFO), Shared Memoyr is like 1MB, Mailbox is only with few bytes, by default the two OS communicate over RPmsg.
3. they have their own GIC, but assigned with same PPI/SPI interrupt, all peripherals can be route to either cluster..
I'm not a kernel developer, but given what you've said I suspect it will be an uphill battle.
Taking the GIC first, an OS is typically going to expect one GIC shared by all the cores running the same instance of the OS. That's the assumption most standard drivers I've seen operate on. I think you could work around it, but it could be quite complicated. One example of why, in GIC an interrupt can only be Acknowledged by the core it is targeted to, but it can Deactivated from any core. Things like threaded interrupt handlers make use of that property - but it won't work if the Ack'ing core and Deactivating core are connected to different GICs. SW would have to know that was a possibility, test for it, and then deal with it in an SoC-specific way.
The memory system also sounds like it's going to be challenging. If I've understood you correctly, most of the memory each core can see is not visible to the other cluster. It's only a small portion of memory which is visible to both. You could limit the allocator to just using the shared memory, but as it's only 1MB that's not very useful. Using the non-shared memory is going to be a real challenge, as the kernel would need to know where the process was running to decide which memory to allocate. Something like that can happen already, but that's usually for performance reasons not functional reasons.