Graphics, Gaming, and VR forum ARM_import_memory API is very slow

State Accepted Answer
+1 person also asked this people also asked this
Locked Locked
Replies 5 replies
Subscribers 137 subscribers
Views 20435 views
Users 0 members are here

Options

Related

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM_import_memory API is very slow

Shouwen over 6 years ago

I want to use clImportMemoryARM API to achieve zero copy between CPU and GPU.

However, the performance is not what I expected. For a FHD image, it takes 4.4 ms for importing, almost identical to uploading explicitly.

Is this slow performance expected? I am using Mali G72 GPU.

Thanks,

-Shouwen

Top replies

Parents

0 Vijay K over 6 years ago in reply to Kévin Petit

Hi Kevin,

I have raised a support case for the above issue with more details to your question.

Regards,

Vijay
Cancel
Up 0 Down

Cancel

Reply

0 Vijay K over 6 years ago in reply to Kévin Petit

Hi Kevin,

I have raised a support case for the above issue with more details to your question.

Regards,

Vijay
Cancel
Up 0 Down

Cancel

Children

+1 Kévin Petit over 6 years ago in reply to Vijay K

Hi Vijay,

Thanks for the details.

Looking again at the code you've shared, I think I understand why you're finding the import call slow.

Linux over-commits memory which means that when you're calling malloc, the Linux kernel (via the C library) is just allocating a range of virtual addresses that aren't yet backed by physical memory pages. Physical pages are allocated lazily by the kernel the first time one virtual address in the corresponding range is accessed.

clImportMemoryARM requires that all the backing pages have been allocated for the import to complete (so that there is no need to interrupt GPU work to allocate pages later on).

Since you import the memory straight after the allocation, it means clImportMemoryARM will have to allocate and initialise (i.e. zero for security reasons) physical pages for the entirety of the allocation, which is where most of the time is spent.

If you initialise the memory before the import (writing a single byte in each page, i.e. every 4kB, should be enough), you'll find that the import call takes a lot less time.

Regards,

Kévin
Cancel
Up +1 Down

Cancel