I want to use clImportMemoryARM API to achieve zero copy between CPU and GPU.
However, the performance is not what I expected. For a FHD image, it takes 4.4 ms for importing, almost identical to uploading explicitly.
Is this slow performance expected? I am using Mali G72 GPU.
Thanks,
-Shouwen
Hi Kevin,
I have raised a support case for the above issue with more details to your question.
Regards,
Vijay
Hi Vijay,
Thanks for the details.
Looking again at the code you've shared, I think I understand why you're finding the import call slow.
Linux over-commits memory which means that when you're calling malloc, the Linux kernel (via the C library) is just allocating a range of virtual addresses that aren't yet backed by physical memory pages. Physical pages are allocated lazily by the kernel the first time one virtual address in the corresponding range is accessed.
clImportMemoryARM requires that all the backing pages have been allocated for the import to complete (so that there is no need to interrupt GPU work to allocate pages later on).
Since you import the memory straight after the allocation, it means clImportMemoryARM will have to allocate and initialise (i.e. zero for security reasons) physical pages for the entirety of the allocation, which is where most of the time is spent.
If you initialise the memory before the import (writing a single byte in each page, i.e. every 4kB, should be enough), you'll find that the import call takes a lot less time.
Kévin