This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to make Ethos-U NPU work on an ARM Cortex-A + Cortex-M processor?

I have a question about how to make Ethos-U NPU work on a ARM Cortex-A + Cortex-M processor. First, I found ethos-u-linux-driver-stack and ethos-u-core-software on https://git.mlplatform.org/.

1. I know ethos-u-linux-driver-stack is Ethos-U kernel driver. Should it be integrated into the Linux OS running on Cortex-A or be integrated into the Linux OS running on Cortex-M? I am nor clear about which core it need to perform on.

2. For ethos-u-core-software, how to run it? I did't find the detail steps to run it. Does it run on NPU or any core?

3. Except the above two repos, is there any other repo necessory to make Ethos-U NPU work on an ARM Cortex-A + Cortex-M processor?

Thanks for your suggestion in advance.

Top replies

Parents

0 alisonw over 3 years ago in reply to Kristofer Jonsson

Hi, Kristofer, I am still confused about the communication between Cortex-A and Cortex-M. As you mentioned, the current communication is using the Linux kernel mailbox APIs and virtio/rpmsg are not used. I want to know whether the current codes are sufficient to accomplish the communication between Cortex-A and Cortext-M. Do virtio and rpmsg need to be used?
Cancel
Up 0 Down

Cancel

Reply

0 alisonw over 3 years ago in reply to Kristofer Jonsson

Hi, Kristofer, I am still confused about the communication between Cortex-A and Cortex-M. As you mentioned, the current communication is using the Linux kernel mailbox APIs and virtio/rpmsg are not used. I want to know whether the current codes are sufficient to accomplish the communication between Cortex-A and Cortext-M. Do virtio and rpmsg need to be used?
Cancel
Up 0 Down

Cancel

Children

0 Kristofer Jonsson over 3 years ago in reply to alisonw

Hi

The code published on MLPlatform is sufficient for Linux to dispatch inferences to the Arm Cortex-M.

The reason for moving towards virtio/rpmsg (OpenAMP on the Arm Cortex-M side) is that those are Linux native APIs. They provide standard, well designed and well tested communication channels. With that said, what we have published so far is fully functional.

For reference you could follow the call chain from ethosu_inference_create() to see how the kernel driver creates an inference and dispatches it to the Arm Ethos-U subsystem. On the Core side the message is received and handled in MessageProcess::handleMessage().

Best regards
Kristofer
Cancel
Up 0 Down

Cancel
0 alisonw over 3 years ago in reply to Kristofer Jonsson

Kristofer, thanks for your reply, I got it. Another question, what is your suggestion about the OS or Non-OS running on Cortex-M? How about running FreeOS on Cortex-M?
Cancel
Up 0 Down

Cancel
0 Kristofer Jonsson over 3 years ago in reply to alisonw

I would like to divide this answer into two parts.

The Arm Ethos-U NPU driver is OS agnostic and does not use any OS specific primitives (like mutex, queues, etc). It consequently can be paired up with any RTOS. You spawn a thread that drive the TFLu runtime from within the thread. I would even go one step further and say that it is recommended to use a RTOS. So yes, you can use FreeRTOS or any other RTOS you prefer.

What is not (yet) supported are user facing OS APIs. The APIs would be used by applications to schedule inferneces, and implemented by drivers to run inferences on the physical device. The APIs would provide hardware abstraction (you do not know which hardware that accelerates your network) and scheduling (multiple applications can share multiple NPUs).
Cancel
Up 0 Down

Cancel
0 alisonw over 3 years ago in reply to Kristofer Jonsson

Kristofer, according to my understanding about the current code, Tensorflow Lite APIs (for example, interpreter.Invoke) are used by the application and call the ethosu driver in micro/. Are Tensorflow Lite APIs one kind of the user facing OS APIs you mentioned? I am not clear about the user facing OS APIs which is not (yet) supported. Please correct and guide me.
Cancel
Up 0 Down

Cancel
0 Kristofer Jonsson over 3 years ago in reply to alisonw

User facing APIs would be part of the OS and should be generic enough to support multiple frameworks (TFLu, TVM, etc). They should allow multiple applications to share NPU resources and ideally provide hardware abstraction (the application is unaware of which hardware that accelerates the network). An application would do an OS call to run an inference, instead of directly calling interpreter.Invoke().

These APIs don't exist today and we do not yet have a clear picture of what they would look like, or if this even is the right way to go. Hardware abstraction might also be difficult to achieve, because networks might have been optimized for a specific hardware.
Cancel
Up 0 Down

Cancel
0 alisonw over 3 years ago in reply to Kristofer Jonsson

Kristofer, I have a question about message handling on Cortex-M. As we have hardware Message Unit IP on our silicon, and the driver supports mbox_send_message API in Linux kernel, so the message handling on Cortex-A is Ok. But I didn't find the detailed mailbox code in core_software/applications/message_process/src/message_process.cc. How should the mailbox work in message_process.cc? Could you give some guide?
Cancel
Up 0 Down

Cancel
0 Kristofer Jonsson over 3 years ago in reply to alisonw

We have tried to mimic what Linux does and abstract the Mailbox driver behind a Mailbox API. The APIs can be found in <ethos-u-core-software>/drivers/mailbox/include/mailbox.hpp and should allow developers to use any mailbox IP they prefer.

We have a driver implementation for the Arm MHU v2, but it has not yet reach upstream.

For a bare metal application instantiation could for example look like this:

// Use section attributes to allow the linker script to place the queues at given addresses in memory
__attribute__((section("ethosu_core_in_queue"))) MessageProcess::Queue<1000> inQueue;
__attribute__((section("ethosu_core_out_queue"))) MessageProcess::Queue<1000> outQueue;
Mailbox::YourCustomMailboxDriver mailbox;
InferenceProcess::InferenceProcess inferenceProcess;
MessageProcess::MessageProcess messageProcess(*inQueue.toQueue(), *outQueue.toQueue(), mailbox, inferenceProcess);

// You need to implement this class for your custom mailbox driver
class YourCustomMailboxDriver : public Mailbox {
public:
    YourCustomMailboxDriver();
    virtual ~YourCustomMailboxDriver();

    // Trigger an IRQ on the remote CPU
    virtual bool sendMessage() final;

    // This function should be called from the IRQ routine
    // It should clear the IRQ and call notify() to inform registered clients about the received message
    virtual void handleMessage() final;
};
Cancel
Up 0 Down

Cancel