I have an idea for RTX and other RTOS, adding message bus feature to them!
Like a CAN bus this bus operate but between tasks instead of devices.
Many times a task have a message that can publish and many tasks are user of that message, each message must have an 8bit identifier that will be published at head of message(no crc).
each task have an adjustable filter/mask that can receive messages that need to get.
This method is better than send on message to several tasks separately.
Remember that devices connected to a CAN bus are parallell devices - they all operate concurrently and each and everyone is resposible for being fast enough to handle received CAN messages.
Threads in a RTOS environments are not concurrent. They run one-by-one and have different priorities. That means that they all must have their own receive queue to allow them to buffer received messages until they get access to the CPU to start consuming incomming data. This is why the "subscribe" model is so popular - any thread interested in a specific type of message subscribes to this data and registers a message queue or a callback function.
The only other option besides having individual queues, would be for the message creator to donate the CPU time to do a round-robin pass through all interested threads to let them process the new message before the publisher thread finally gets back the CPU and can figure out when to generate a new message. But this does not work in a RTOS environment, because locking up the processor for a long time breaks the real-time requirements.
So in the end, it's hard to do better than to send the new message to every interested consumer thread. Then each consumer is responsible for being able to consume received messages fast enough before the buffer space overflows.
there is one way to do this but need hardware help.
In microcontroller we add a peripheral block that do this job in hardware, each task have a hardware Tx/Rx message box and when a task publish a message in real time all tasks that can accept the message receive it.(like CAN system but in chip internal as a peripheral),we add a systick timer in micro for better RTOS implementation, we can also add a message bus peripheral in hardware for better/advanced RTOS intertask communication.
what is your idea?
All seems rather contrived, and deals poorly with priority and order in which things might be processed. Much simpler to give each thread/task it's own message pipeline which it can process in it's own time, rather than use one message for everyone and manage when everyone's cleared/processed it.
Why does everything need to see/filter every message, build a more effective dispatch methodology which identifies and dispatches a message to the appropriate thread/task.
in classic method between each two task there must be a pipeline (like a wire in classic parallel wiring in hardware systems wiring),it is better we have a single pipeline between all tasks, a task publish a message (using a new hardware peripheral dedicated for this)and other tasks that have a Rx configurable filter/mask can receive this message(message should pass the filter to reach the tasks)
like the CANBUS messaging hardware, with CAN bus traditional parallel wiring was replaced with a bus and no need to add new wire for new data and new devices.
This idea will be excellent for next generation RTOS but need to add a new "message-bus peripheral" in microcontroller for implementing it.
But "new hardware" doesn't know the format or size of the messages. And "new hardware" doesn't even know how the messages are allocated.
Doing:
for (i = 0; i < subscribers; i++) { subscribers[i]->register(msgid,msgdata,datasize); }
allows each thread to have dedicated code that allocates a copy of the message. And individual queue sizes. And individual prioritizing of the messages. And individual option to overwrite or drop when full.
So maybe a round-robin byte array for the message data. Or maybe a round-robin array of fixed-size message bodies.
And with the ability of the software engineer - the one guy/gal with the best knowledge about the project requirements - to decide exactly how much RAM to allocate for each individual message queue based on thread priority and importance.
Even better? The software engineer may use multiple message queues of different priority, allowing a high-priority message to overtake a low-priority message. How would a hardware implementation manage this?
And remember that lots of message passing really is just one-to-one - so a UART driver can send a message with zero message body. The "pointer" can represent the received character, or a parity/overflow/underflow error.
Look at your CAN. It is normally not implemented to support message queues. A typical CAN controller normally have multiple receive buffers - but not serially but in parallell. So the software gets the most prioritized message and not the first message. And if the chip has 3 receive buffers and a fourth message arrives then you get a message dropped. And you have one processor dedicated for every listening node which is completely different from a threaded program running within a single processor.
If would be a nightmare to try to engineer your hardware "bus" and make it general enough that it would actually be able to solve the variety of problems a real embedded program requires solved. You would basically need a dedicated processor just to figure out exactly how to handle all priorities for a single task and between the individual tasks. But it's just that you can't use a dedicated slave processor for this, because that would add lots of latency. And very much varying latency too.
If I know that a message should have two listeners, and I know that one listener is more important, then I can decide to message the higher-priority thread first. And maybe then message the lower-priority task. Or have the high-priority task forward the message to the low-prio task (which means I don't need to duplicate the message).
Buffer use, and latency, is just so extremely important in a realtime-critical application.
Wouldn't that just add a lot of gates? Why not add a core for each task, so they can execute in parallel and you don't have to serialize or switch context?
Adding cores is the general way of improving things. Then it's even possible to lock threads to individual cores and significantly minimize the latencies - as long as the code isn't stupid enough to get stuck because of bad use of mutexes or other locking primitives.
So using the transistors for more cores is definitely the better route than to try to make a super-complex message-passing hardware. Cores adds options. Dedicated hardware normally does the reverse - forces the user to adapt the code to the hard-coded limitations of the hardware.