This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Zero Copy Buffers using cl_arm_import_memory extension in OpenCL 1.2 - arm mali midgard GPUs.

Hi,

I wish to allocate a vector and use it's data pointer to allocate a zero copy buffer on the GPU. There is this cl_arm_import_memory extension which can be used to do this. But I am not sure wether its supported for all mali midgard OpenCL drivers or not.

I was going through this link and I am quite puzzled by the following lines : -

      If the extension string cl_arm_import_memory_host is exposed then importing
      from normal userspace allocations (such as those created via malloc) is
      supported.

What exactly does these lines mean ? I am specifically working on rockchip's RK3399 boards. Kindly help.

Top replies

Parents

0 abhi.verma over 5 years ago in reply to abhi.verma

Hi, I am seeing a performance difference between when I allocate cl_mem using arm_import_memory and when I allocate using CL_MEM_ALLOC_HOST_PTR. The kernel execution time decreases by 10% when buffer is allocated by passing the CL_MEM_ALLOC_HOST_PTR flag in clCreateBuffer() function. Is this the expected behaviour ? and is there any workaround for it?
Cancel
Up 0 Down

Cancel

Reply

0 abhi.verma over 5 years ago in reply to abhi.verma

Hi, I am seeing a performance difference between when I allocate cl_mem using arm_import_memory and when I allocate using CL_MEM_ALLOC_HOST_PTR. The kernel execution time decreases by 10% when buffer is allocated by passing the CL_MEM_ALLOC_HOST_PTR flag in clCreateBuffer() function. Is this the expected behaviour ? and is there any workaround for it?
Cancel
Up 0 Down

Cancel

Children

0 Kévin Petit over 5 years ago in reply to abhi.verma

Hi,

This is expected behaviour. What you are likely measuring (I can confirm if you tell me exactly how you're measuring this) is the cost of maintaining data consistency between the CPU and GPU.

Conceptually, running a kernel on imported host memory has roughly the same cost as unmapping a buffer, running the kernel and mapping the buffer on the CPU again.

You can reduce that cost to a minimum by batching kernels into as few flush groups as possible. Later drivers are better at this.

Regards,

Kévin
Cancel
Up +1 Down

Cancel
0 willhua over 4 years ago in reply to Kévin Petit

> Conceptually, running a kernel on imported host memory has roughly the same cost as unmapping a buffer, running the kernel and mapping the buffer on the CPU again.

Hi, Kevin. Can you explain this in detail? such as why unmap and map is needed?
Cancel
Up 0 Down

Cancel