Graphics, Gaming, and VR forum ARM_import_memory API is very slow

State Accepted Answer
+1 person also asked this people also asked this
Locked Locked
Replies 5 replies
Subscribers 137 subscribers
Views 20433 views
Users 0 members are here

Options

Related

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM_import_memory API is very slow

Shouwen over 6 years ago

I want to use clImportMemoryARM API to achieve zero copy between CPU and GPU.

However, the performance is not what I expected. For a FHD image, it takes 4.4 ms for importing, almost identical to uploading explicitly.

Is this slow performance expected? I am using Mali G72 GPU.

Thanks,

-Shouwen

Top replies

Parents

+1 Kévin Petit over 6 years ago in reply to Shouwen

Hi Shouwen,

Sorry for the delay (and the disappointing performance).

You're doing nothing wrong and this is definitely not the kind of performance we're expecting.

You can get the DDK version using the device info queries (clGetDeviceInfo with CL_DEVICE_VERSION). This would be really useful information for us.

To better understand your use-case and how we can help, could you answer the following questions please?

- What is your primary driver for wanting to use user memory imports? Maybe we can help you find an alternative.

- Are you writing a third party Android applications that you're testing on a Mate10?

- Do you have access to more detailed information about the phone's internals?

- Do you have access to the Mali driver source? If yes, I would encourage you to raise a support case with ARM.

If there's anything more we can do to help, please let us know.

Regards,

Kévin
Cancel
Up +1 Down

Cancel

Reply

+1 Kévin Petit over 6 years ago in reply to Shouwen

Hi Shouwen,

Sorry for the delay (and the disappointing performance).

You're doing nothing wrong and this is definitely not the kind of performance we're expecting.

You can get the DDK version using the device info queries (clGetDeviceInfo with CL_DEVICE_VERSION). This would be really useful information for us.

To better understand your use-case and how we can help, could you answer the following questions please?

- What is your primary driver for wanting to use user memory imports? Maybe we can help you find an alternative.

- Are you writing a third party Android applications that you're testing on a Mate10?

- Do you have access to more detailed information about the phone's internals?

- Do you have access to the Mali driver source? If yes, I would encourage you to raise a support case with ARM.

If there's anything more we can do to help, please let us know.

Regards,

Kévin
Cancel
Up +1 Down

Cancel

Children

0 Vijay K over 6 years ago in reply to Kévin Petit

Hi Kevin,

I have raised a support case for the above issue with more details to your question.

Regards,

Vijay
Cancel
Up 0 Down

Cancel
+1 Kévin Petit over 6 years ago in reply to Vijay K

Hi Vijay,

Thanks for the details.

Looking again at the code you've shared, I think I understand why you're finding the import call slow.

Linux over-commits memory which means that when you're calling malloc, the Linux kernel (via the C library) is just allocating a range of virtual addresses that aren't yet backed by physical memory pages. Physical pages are allocated lazily by the kernel the first time one virtual address in the corresponding range is accessed.

clImportMemoryARM requires that all the backing pages have been allocated for the import to complete (so that there is no need to interrupt GPU work to allocate pages later on).

Since you import the memory straight after the allocation, it means clImportMemoryARM will have to allocate and initialise (i.e. zero for security reasons) physical pages for the entirety of the allocation, which is where most of the time is spent.

If you initialise the memory before the import (writing a single byte in each page, i.e. every 4kB, should be enough), you'll find that the import call takes a lot less time.

Regards,

Kévin
Cancel
Up +1 Down

Cancel