Computer vision in ACL

Hello, I am trying to perform inference on ARM devices using the YOLOv8 model with the Arm Compute Library (ACL). Currently, I am able to provide a single image and obtain correct detections. My goal, however, is to achieve real-time inference, but I am unsure how to feed frames to the implementation. Based on the ACL documentation and examples I have reviewed, it seems the library only accepts files in npy, jpg, and ppm formats. I am new to this and would greatly appreciate any guidance on how to proceed.

Thank you in advance!

Parents
  • Hi,

    It's normal to have to sculpt the data to fit a particular model. For CV that often means moving it from an image format into a buffer that will be the input(s) to the ML Neural Net that is being processed.

    ACL is just the compute part, so is more easily used as part of a fuller ML framework, like ArmNN. It also, through KleidiAI, provides kernels into other frameworks to be used behind the scenes in eg XNNPACK for TFLite/ExecuTorch without any effort on the user's part.

    That said, you've got it working directly with YOLO for 1 image - and that's the important part as far as ACL is concerned. Beyond that it's working with the buffers to feed it an image each frame, and the most efficient way to work with buffers is more of an OS-dependent implementation detail. The details of getting the frames from a video or camera feed and into a buffer are very dependent on where they're coming from and separate to the ACL implementation. Sorry I can't give an easier answer.

    Cheers,

    Ben

Reply
  • Hi,

    It's normal to have to sculpt the data to fit a particular model. For CV that often means moving it from an image format into a buffer that will be the input(s) to the ML Neural Net that is being processed.

    ACL is just the compute part, so is more easily used as part of a fuller ML framework, like ArmNN. It also, through KleidiAI, provides kernels into other frameworks to be used behind the scenes in eg XNNPACK for TFLite/ExecuTorch without any effort on the user's part.

    That said, you've got it working directly with YOLO for 1 image - and that's the important part as far as ACL is concerned. Beyond that it's working with the buffers to feed it an image each frame, and the most efficient way to work with buffers is more of an OS-dependent implementation detail. The details of getting the frames from a video or camera feed and into a buffer are very dependent on where they're coming from and separate to the ACL implementation. Sorry I can't give an easier answer.

    Cheers,

    Ben

Children