I have an idea for RTX and other RTOS, adding message bus feature to them!
Like a CAN bus this bus operate but between tasks instead of devices.
Many times a task have a message that can publish and many tasks are user of that message, each message must have an 8bit identifier that will be published at head of message(no crc).
each task have an adjustable filter/mask that can receive messages that need to get.
This method is better than send on message to several tasks separately.
Adding cores is the general way of improving things. Then it's even possible to lock threads to individual cores and significantly minimize the latencies - as long as the code isn't stupid enough to get stuck because of bad use of mutexes or other locking primitives.
So using the transistors for more cores is definitely the better route than to try to make a super-complex message-passing hardware. Cores adds options. Dedicated hardware normally does the reverse - forces the user to adapt the code to the hard-coded limitations of the hardware.