I have an idea for RTX and other RTOS, adding message bus feature to them!
Like a CAN bus this bus operate but between tasks instead of devices.
Many times a task have a message that can publish and many tasks are user of that message, each message must have an 8bit identifier that will be published at head of message(no crc).
each task have an adjustable filter/mask that can receive messages that need to get.
This method is better than send on message to several tasks separately.
But "new hardware" doesn't know the format or size of the messages. And "new hardware" doesn't even know how the messages are allocated.
Doing:
for (i = 0; i < subscribers; i++) { subscribers[i]->register(msgid,msgdata,datasize); }
allows each thread to have dedicated code that allocates a copy of the message. And individual queue sizes. And individual prioritizing of the messages. And individual option to overwrite or drop when full.
So maybe a round-robin byte array for the message data. Or maybe a round-robin array of fixed-size message bodies.
And with the ability of the software engineer - the one guy/gal with the best knowledge about the project requirements - to decide exactly how much RAM to allocate for each individual message queue based on thread priority and importance.
Even better? The software engineer may use multiple message queues of different priority, allowing a high-priority message to overtake a low-priority message. How would a hardware implementation manage this?
And remember that lots of message passing really is just one-to-one - so a UART driver can send a message with zero message body. The "pointer" can represent the received character, or a parity/overflow/underflow error.
Look at your CAN. It is normally not implemented to support message queues. A typical CAN controller normally have multiple receive buffers - but not serially but in parallell. So the software gets the most prioritized message and not the first message. And if the chip has 3 receive buffers and a fourth message arrives then you get a message dropped. And you have one processor dedicated for every listening node which is completely different from a threaded program running within a single processor.
If would be a nightmare to try to engineer your hardware "bus" and make it general enough that it would actually be able to solve the variety of problems a real embedded program requires solved. You would basically need a dedicated processor just to figure out exactly how to handle all priorities for a single task and between the individual tasks. But it's just that you can't use a dedicated slave processor for this, because that would add lots of latency. And very much varying latency too.
If I know that a message should have two listeners, and I know that one listener is more important, then I can decide to message the higher-priority thread first. And maybe then message the lower-priority task. Or have the high-priority task forward the message to the low-prio task (which means I don't need to duplicate the message).
Buffer use, and latency, is just so extremely important in a realtime-critical application.